Search by job, company or skills

C

IT engineer data lakehouse

3-6 Years

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 months ago
  • Over 100 applicants

Job Description

Job Summary

  • Design, develop, and operate scalable and maintainable data pipelines in the Azure Databricks environment
  • Develop all technical artefacts as code, implemented in professional IDEs, with full version control and CI/CD automation
  • Enable data-driven decision-making in Human Resources (HR), Purchasing (PUR) and Finance (FIN) by ensuring high data availability, quality, and reliability
  • Implement data products and analytical assets using software engineering principles in close alignment with business domains and functional IT
  • Apply rigorous software engineering practices such as modular design, test-driven development, and artifact reuse in all implementations
  • Global delivery footprint; cross-functional data engineering support across HR, PUR & FIN domains
  • Collaboration with business stakeholders, functional IT partners, product owners, architects, ML/AI engineers, and Power BI developers
  • Agile, product-team structure embedded in an enterprise-scale Azure environment

Main Tasks:

  • Design scalable batch and streaming pipelines in Azure Databricks using PySpark and/or Scala
  • Implement ingestion from structured and semi-structured sources (e.g., SAP, APIs, flat files)
  • Build bronze/silver/gold data layers following the defined lakehouse layering architecture & governance
  • Implement use-case driven dimensional models (star/snowflake schema) tailored to HR, PUR & FIN needs
  • Ensure compatibility with reporting tools (e.g., Power BI) via curated data marts and semantic models
  • Implement enterprise-level data warehouse models (domain-driven 3NF models) for HR, PUR & FIN data, closely aligned with data engineers for other business domains
  • Develop and apply master data management strategies (e.g., Slowly Changing Dimensions)
  • Develop automated data validation tests using frameworks
  • Monitor pipeline health, identify anomalies, and implement quality thresholds
  • Establish data quality transparency by defining and implementing meaningful data quality rules with source system and business stakeholders and implementing related reports
  • Develop and structure pipelines using modular, reusable code in a professional IDE
  • Apply test-driven development (TDD) principles with automated unit, integration, and validation tests
  • Integrate tests into CI/CD pipelines to enable fail-fast deployment strategies
  • Commit all artifacts to version control with peer review and CI/CD integration
  • Work closely with Product Owners to refine user stories and define acceptance criteria
  • Translate business requirements into data contracts and technical specifications
  • Participate in agile events such as sprint planning, reviews, and retrospectives
  • Document pipeline logic, data contracts, and technical decisions in markdown or auto-generated docs from code
  • Align designs with governance and metadata standards (e.g., Unity Catalog)
  • Track lineage and audit trails through integrated tooling
  • Profile and tune data transformation performance
  • Reduce job execution times and optimize cluster resource usage
  • Refactor legacy pipelines or inefficient transformations to improve scalability

More Info

Job Type:
Industry:
Function:
Employment Type:
Open to candidates from:
Indian

Job ID: 120573019