Partner Solution

Contextualized OT data pipeline into Databricks

How can we reliably collect vendor-agnostic OT data from the shop floor, enrich it with asset and production context at the edge, and deliver a governed, AI-ready copy into Databricks for advanced analytics, Industrial AI and enterprise consumption?

Overview

Siemens Industrial Edge to Databricks

Connect vendor agnostic shopfloor equipment to Industrial Edge via pre-configured connectors.
Create a secure and reliable connection between Siemens Industrial Edge Devices and the Databricks Data Intelligence Platform using FFT DataBridge for file-based ingestion into cloud object storage (S3, ADLS or GCS)
Store and manage harmonized industrial data in the Databricks Lakehouse for scalable analytics and Industrial AI.
Perform advanced analytics and train AI models on the harmonized data layer, including OEE monitoring, predictive maintenance, quality optimization and agentic AI applications. Deploy models back to Industrial Edge for low-latency execution.

Download PDF

A hybrid edge-to-cloud setup where Industrial Edge ingests and enriches OT data streams, aligning and contextualizing telemetry and events at the source before forwarding them via FFT DataBridge through streaming ingestion into Databricks. Within Databricks, the data is transformed and structured across landing, curated, and analytics tiers, forming the enterprise lakehouse foundation for: advanced analytics, Industrial AI, model development and lifecycle management, operational applications and enabling integration with MES, ERP and SCADA environments. The overall approach is designed to ensure AI readiness, trusted and consistent data, robust security, high resilience and open, vendor-neutral interoperability.

Detailed architecture

architecture-hub-databricks-detail-architecture-1280x720

Download detailed PDF

Edge collection and contextualization (Industrial Edge)

Industrial Edge runs on-prem devices close to the shop floor and connects to vendor-agnostic automation equipment via OT connectors (OPC UA, Modbus, EtherNet/IP, etc.). It acquires raw telemetry, alarms and events.

At the edge, data is pre-processed: filtering, compression, timestamp normalization, enrichment with asset metadata (asset hierarchies, work order / batch context), and local aggregation to reduce cloud bandwidth.

An internal databus (MQTT / Unified Namespace) or Industrial Information Hub propagates harmonized topic streams for downstream components and local consumers.

Protocol and format bridging

FFT DataBridge (Edge App) prepares and enriches data for streaming and near–real-time ingestion into Databricks. Its free companion app, FFT DataService, accesses contextualized data from Industrial Information Hub Essentials (Edge App) and makes it available to FFT DataBridge, which then publishes aligned, contextualized data streams via Zerobus, enabling continuous delivery directly into Unity Catalog–governed tables.

To ensure robustness, the solution uses in-memory buffering and local persistence to bridge connectivity interruptions and extended outages. On the Databricks side, data is ingested incrementally into Delta tables under Unity Catalog, enabling governed, low-latency access for downstream analytics and AI workloads. Secure connectivity is maintained through token-based or key-based authentication mechanisms.

Databricks data intelligence platform

Streaming ingestion via Zerobus continuously delivers data into Databricks, where incoming OT payloads are written into Bronze Delta tables governed by Unity Catalog, preserving raw structure and metadata for full traceability and auditability.

Transformation pipelines built with Lakeflow Declarative Pipelines, Databricks Workflows, and Apache Spark progressively refine the data into Silver (curated) and Gold (analytical) layers, supporting time alignment, contextual enrichment, and readiness for BI consumption as well as AI-driven use cases.

AI models are developed and trained centrally in Databricks using MLflow and Mosaic AI, and can then be deployed back to Siemens Industrial Edge for low-latency execution close to the shop floor—enabling closed-loop optimization and physical AI scenarios.

Unity Catalog enforces end-to-end governance, including fine-grained access control, data masking, and lineage tracking, while the Lakehouse Platform runs natively across AWS, Microsoft Azure, and Google Cloud Platform, supporting cross-cloud deployment and seamless data mobility.

Values & benefits

Unified, AI-ready manufacturing data foundation

Databricks stores raw and curated OT data in an open lakehouse, enabling consistent analytics, Industrial AI and cross-factory insights from a single governed platform.

Reduced cloud cost and bandwidth

Edge-side aggregation, filtering and intelligent batching by FFT DataBridge lower data volume and cloud ingestion costs while preserving fidelity where needed.

Accelerated industrial AI and advanced analytics

MLflow, Mosaic AI and Databricks SQL accelerate feature engineering, model training and deployment for predictive maintenance, quality optimization and energy management.

Operational resilience

Edge buffering (ring memory, file persistence) and retry logic in FFT DataBridge maintain data continuity during network outages without data loss.

Security and governance

End-to-end encryption, token-based authentication, role-based access and Unity Catalog controls protect sensitive operational data with full lineage tracking.

Closed-loop AI from cloud to edge

AI models trained in Databricks can be deployed back to Industrial Edge for low-latency decision-making, enabling physical AI and autonomous manufacturing.

Components

Industrial Edge

Hosts OT connectors, apps, local databus and buffer storage.

Responsible for secure acquisition, enrichment, filtering, aggregation and orchestration.

Managed centrally via Industrial Edge Management.

Architecture Hub Industrial Information Hub

Industrial connectors & Industrial Information Hub

Siemens Industrial Edge Ecosystem industrial connectors unlock shopfloor data by connecting assets from different vendors.

Industrial Information Hub enables the harmonization, contextualization and mapping of industrial data.

FFT DataBridge (Edge app)

Buffers data to handle outages, retries, interruptions, batching and packaging.

Streams industrial data via Zerobus into Databricks Delta tables governed by Unity Catalog.

No extra ingestion middleware needed.

Databricks data intelligence platform

Bronze layer: preserves original payloads and metadata as delta tables for auditability.

Silver/Gold layers: structured, curated datasets used for reporting and operational apps.

Near-real-time ingestion via Zerobus.