Design, develop, and operate scalable and maintainable data pipelines in the Azure Databricks environment
Develop all technical artefacts as code, implemented in professional IDEs, with full version control and CI/CD automation
Enable data-driven decision-making in Human Resources (HR), Purchasing (PUR) and Finance (FIN) by ensuring high data availability, quality, and reliability
Implement data products and analytical assets using software engineering principles in close alignment with business domains and functional IT
Apply rigorous software engineering practices such as modular design, test-driven development, and artifact reuse in all implementations
Global delivery footprint; cross-functional data engineering support across HR, PUR & FIN domains
Collaboration with business stakeholders, functional IT partners, product owners, architects, ML/AI engineers, and Power BI developers
Agile, product-team structure embedded in an enterprise-scale Azure environment
Main Tasks:
Design scalable batch and streaming pipelines in Azure Databricks using PySpark and/or Scala
Implement ingestion from structured and semi-structured sources (e.g., SAP, APIs, flat files)
Build bronze/silver/gold data layers following the defined lakehouse layering architecture & governance
Implement use-case driven dimensional models (star/snowflake schema) tailored to HR, PUR & FIN needs
Ensure compatibility with reporting tools (e.g., Power BI) via curated data marts and semantic models
Implement enterprise-level data warehouse models (domain-driven 3NF models) for HR, PUR & FIN data, closely aligned with data engineers for other business domains
Develop and apply master data management strategies (e.g., Slowly Changing Dimensions)
Develop automated data validation tests using frameworks
Monitor pipeline health, identify anomalies, and implement quality thresholds
Establish data quality transparency by defining and implementing meaningful data quality rules with source system and business stakeholders and implementing related reports
Develop and structure pipelines using modular, reusable code in a professional IDE
Apply test-driven development (TDD) principles with automated unit, integration, and validation tests
Integrate tests into CI/CD pipelines to enable fail-fast deployment strategies
Commit all artifacts to version control with peer review and CI/CD integration
Work closely with Product Owners to refine user stories and define acceptance criteria
Translate business requirements into data contracts and technical specifications
Participate in agile events such as sprint planning, reviews, and retrospectives
Document pipeline logic, data contracts, and technical decisions in markdown or auto-generated docs from code
Align designs with governance and metadata standards (e.g., Unity Catalog)
Track lineage and audit trails through integrated tooling
Profile and tune data transformation performance
Reduce job execution times and optimize cluster resource usage
Refactor legacy pipelines or inefficient transformations to improve scalability