Skip to main content
Brand Evolution 2022
Partner Solution

Contextualized OT data pipeline into Databricks

How can we reliably collect vendor-agnostic OT data from the shop floor, enrich it with asset and production context at the edge, and deliver a governed, AI-ready copy into Databricks for advanced analytics, Industrial AI and enterprise consumption?

Overview

Architecture hub databricks overview

Siemens Industrial Edge to Databricks

  1. Connect vendor agnostic shopfloor equipment to Industrial Edge via pre-configured connectors.
  2. Create a secure and reliable connection between Siemens Industrial Edge Devices and the Databricks Data Intelligence Platform using FFT DataBridge for file-based ingestion into cloud object storage (S3, ADLS or GCS)
  3. Store and manage harmonized industrial data in the Databricks Lakehouse for scalable analytics and Industrial AI.
  4. Perform advanced analytics and train AI models on the harmonized data layer, including OEE monitoring, predictive maintenance, quality optimization and agentic AI applications. Deploy models back to Industrial Edge for low-latency execution.

A hybrid edge-to-cloud setup where Industrial Edge ingests and enriches OT data streams, aligning and contextualizing telemetry and events at the source before forwarding them via FFT DataBridge through streaming ingestion into Databricks. Within Databricks, the data is transformed and structured across landing, curated, and analytics tiers, forming the enterprise lakehouse foundation for: advanced analytics, Industrial AI, model development and lifecycle management, operational applications and enabling integration with MES, ERP and SCADA environments. The overall approach is designed to ensure AI readiness, trusted and consistent data, robust security, high resilience and open, vendor-neutral interoperability.

Detailed architecture

    Edge collection and contextualization (Industrial Edge)

    Industrial Edge runs on-prem devices close to the shop floor and connects to vendor-agnostic automation equipment via OT connectors (OPC UA, Modbus, EtherNet/IP, etc.). It acquires raw telemetry, alarms and events.

    At the edge, data is pre-processed: filtering, compression, timestamp normalization, enrichment with asset metadata (asset hierarchies, work order / batch context), and local aggregation to reduce cloud bandwidth.

    An internal databus (MQTT / Unified Namespace) or Industrial Information Hub propagates harmonized topic streams for downstream components and local consumers.

    Protocol and format bridging

    FFT DataBridge (Edge App) prepares and enriches data for streaming and near–real-time ingestion into Databricks. Its free companion app, FFT DataService, accesses contextualized data from Industrial Information Hub Essentials (Edge App) and makes it available to FFT DataBridge, which then publishes aligned, contextualized data streams via Zerobus, enabling continuous delivery directly into Unity Catalog–governed tables.

    To ensure robustness, the solution uses in-memory buffering and local persistence to bridge connectivity interruptions and extended outages. On the Databricks side, data is ingested incrementally into Delta tables under Unity Catalog, enabling governed, low-latency access for downstream analytics and AI workloads. Secure connectivity is maintained through token-based or key-based authentication mechanisms.

    Databricks data intelligence platform

    Streaming ingestion via Zerobus continuously delivers data into Databricks, where incoming OT payloads are written into Bronze Delta tables governed by Unity Catalog, preserving raw structure and metadata for full traceability and auditability.

    Transformation pipelines built with Lakeflow Declarative Pipelines, Databricks Workflows, and Apache Spark progressively refine the data into Silver (curated) and Gold (analytical) layers, supporting time alignment, contextual enrichment, and readiness for BI consumption as well as AI-driven use cases.

    AI models are developed and trained centrally in Databricks using MLflow and Mosaic AI, and can then be deployed back to Siemens Industrial Edge for low-latency execution close to the shop floor—enabling closed-loop optimization and physical AI scenarios.

    Unity Catalog enforces end-to-end governance, including fine-grained access control, data masking, and lineage tracking, while the Lakehouse Platform runs natively across AWS, Microsoft Azure, and Google Cloud Platform, supporting cross-cloud deployment and seamless data mobility.

    Values & benefits

    Components