Key Responsibilities
Architect Integrated Solutions:
- Lead the architectural design and implementation across edge devices, cloud infrastructure, and machine learning workflows covering raw-to-gold data layers.
Build and Govern the Data Platform:
- Manage data ingestion, transformation, and cataloging across medallion architecture zones (Raw, Bronze, Silver, Gold), aligned with Unity Catalog for governance.
Enable Scalable ML Platform:
- Support ML teams through the development and maintenance of infrastructure for feature storage, model operations, deployment, and monitoring.
Edge Integration and Automation:
- Design secure and scalable OT-IT integration using Docker, Portainer, RabbitMQ, OPC UA, and edge-to-cloud communication practices (including IDMZ).
Monitor and Optimize Pipelines:
- Implement real-time monitoring for ETL and ML pipelines using tools such as Prometheus, and optimize workloads for performance and cost efficiency.
Governance and Security Compliance:
- Enforce data governance standards, including access control, metadata tagging, and enterprise compliance across zones using Unity Catalog and Azure-native tools.
Lead CI/CD Automation:
- Automate the deployment of platform components and ML workflows using Azure DevOps, GitHub Actions, and self-hosted runners in a monorepo structure.
Technical Expertise
Azure Cloud & DevOps:
- Azure Data Factory (ADF) for orchestration
- Azure Databricks (ADB) with ML workspace: Feature Store, Model Store
- Azure Data Lake Storage (ADLS) using medallion architecture
- Azure Event Hub: Topic design, consumer groups, ETL integration
- Azure Streaming Analytics for real-time telemetry data
- Azure Key Vault, App Service, Container Registry (ACR)
- Azure IoT Hub for edge device integration
- Azure DevOps & GitHub Actions for CI/CD automation
- GitHub self-hosted runners for workflow management
Edge and On-Prem Integration:
- Edge VM deployment using Docker and Portainer
- Messaging via RabbitMQ (read/write from edge)
- OPC UA for PLC integration (e.g., FX Filter, NH3 Compressor)
- IDMZ architecture for secure edge-to-cloud integration
Machine Learning Platform & MLOps:
- End-to-end ML lifecycle: Feature engineering, model training, validation, deployment
- Monitoring of deployed models at high frequency (e.g., 1-minute intervals)
- Cloud vs edge deployment strategy, cadence management (weekly, monthly, quarterly)
- MLflow, ADB ML workspace, monorepo structures for model code
Data Architecture & Integration:
- Implementation of medallion architecture
- Integration with Unity Catalog for data governance and sharing
- Real-time SAP ingestion using CDC tools (e.g., Aecorsoft)
- Streaming and API-based data ingestion
- Template-driven ingestion and mapping using configurations
- Consumption layer design for ML, BI, and operational reporting
Governance & Data Modeling:
- Define and implement data governance policies
- Scalable data model design for operational analytics and ML feature generation
- Metadata tagging, access control, and quality enforcement across data layers