Project Role Summary
Need an experiencedSenior Data Engineer (8+ years)to design, develop, and optimizedata pipelines and storage layersin aMedallion ArchitectureonMicrosoft Azure. The ideal candidate will work on building scalableETL/ELT pipelines, ensuringdata governance, security, and performance optimizationforbatch and real-time data processing.
This role requires expertise inAzure Data Factory (ADF), Azure Databricks, Delta Lake, Microsoft Purview, Unity Catalog, Azure Databricks, ADLS Gen 2 and Python/PySparkto transform raw API-based data into curated and structured data marts.
Key Responsibilities
- Design, document and implementscalable ETL/ELT pipelinesto process data fromAPIs into Bronze, Silver and Gold layers.
- Design, document , develop and optimizebatch streaming data pipelinesusingAzure Data Factory, Azure Databricks and Azure Event Hubs.
- Implementincremental loading strategies(CDC, Delta Lake merge/upsert) to efficiently managehistorical and real-time data.
- OptimizeAzure Synapse Analyticsqueries foranalytical performance.
- Designefficient storage solutionsleveragingAzure Data Lake Storage Gen2 Delta Lake.
- Build and maintaindimensional models (Star Schema, Snowflake Schema)inGold Layerfor analytical and reporting use cases.
- Developfact and dimension tables, ensuringreferential integrity, indexing and partitioning.
- Implementdata validation, schema enforcement, and quality checksacrossBronze, Silver and Gold layers
- Ensure compliance withdata governance frameworksusingMicrosoft Purview
- ImplementRole-Based Access Control (RBAC), encryption and data maskingfor secure data handling.
- OptimizeETL pipelines, queries, Databricks clusters, etc.for cost efficiency.
- ImplementAzure Monitor Log Analyticsfor real-timedata pipeline monitoring.
- Fine-tunepartitioning, caching and indexingstrategies forhigh-performance analytics.
- Work closely withData Architects, Analysts, BI Developers and DevOps teamsto ensure smooth data integration.
- EstablishCI/CD pipelines for data engineering(Azure DevOps, GitHub Actions).
- Document data pipelines, models, and transformations in a structureddata dictionary.
Technical Skills
- Strong experience with Azure Data Factory (ADF) and orchestration of ETL pipelines
- Strong experience with Azure Databricks , PySpark, Python
- Strong experience with Delta Lake with experience in optimized storage, versioning, and ACID transactions
- Strong experience with SQL-based analytical processing
- Strong experience with Writing and optimizing ETL/ELT workflows
- Strong experience with streaming frameworks (Azure Stream Analytics, Event Hubs)
- Strong experience with Dimensional modeling (Star Schema, Snowflake Schema)
- Strong experience with Data partitioning, indexing, and query performance tuning
- Strong experience with Microsoft Purview, Unity Catalog for data lineage metadata management
- Strong experience with RBAC, data masking, encryption to protect sensitive information
- Strong experience with Cost-efficient Databricks cluster management
- Strong experience with CI/CD pipelines for data pipelines deployment using Azure DevOps
Soft Skills
- Strong analytical and problem-solving mindset
- Ability to collaborate with cross-functional teams.
- Excellent skills for documentation
- Good communication skills.
Nice to Have
- MicrosoftAzure Data Engineer Associate (DP-203)certification
- Databricks Certified Data Engineer
- Infrastructure as Code (IaC) usingTerraform.