We are seeking a seasoned Data Engineer to design, build, and operate scalable, reliable, production‑grade data pipelines on Azure. This role requires deep, hands‑on expertise in PySpark, Python, Azure Synapse, and modern data orchestration frameworks (Dagster preferred). The role demands a strong understanding of distributed data processing, cloud-based data lake architectures, and end‑to‑end pipeline ownership, spanning raw data ingestion to analytics‑ready datasets.
Immediate joiners highly preferred (0–30 days) || Location: Pune/ Gurgaon (Hybrid: 2–3 days WFO)
Key competencies:
6+ years of experience in CORE DATA ENGINEERING.
Core Technical Skills
- Strong, hands-on PySpark experience in production environments
- Advanced Python skills for data engineering (beyond notebooks and scripts)
- Practical experience with Azure Synapse Analytics (Pipelines + Spark)
- Solid understanding of distributed systems and cloud data processing
Data Engineering Foundations
- Strong grasp of ETL / ELT design patterns
- Experience with large-scale data lakes (ADLS Gen2 or equivalent)
- Excellent SQL skills for transformation and analytics workflows
- Experience handling structured and semi-structured data
Key Responsibilities:
Distributed Data Engineering (PySpark)
- Design and build distributed data pipelines using PySpark on large-scale datasets
- Develop and optimize pipelines using:
o Spark DataFrames and Spark SQL
o Partitioning, caching, and join optimization strategies
o Efficient read/write patterns for cloud object storage
- Diagnose and remediate performance bottlenecks including skew, shuffle issues, and memory pressure
- Ensure pipelines meet SLAs for latency, reliability, and data quality
Python Engineering for Data Platforms
- Write production-grade Python code for data pipelines, transformations, and utilities
- Apply strong engineering practices:
o Modular code structure and reusable libraries
o Dependency and package management
o Logging, error handling, and testability
- Build framework-agnostic pipeline logic where appropriate (decoupled from orchestration layer)
Azure Synapse & Cloud Data Integration
- Hands-on development with Azure Synapse Analytics, including:
o Synapse Pipelines (data movement and orchestration)
o Synapse Notebooks (Spark-based transformations)
o Linked Services and Integration Runtimes
- Participate in or lead data platform migrations (on-prem → Azure, legacy ETL → Synapse / Spark)
- Optimize pipeline execution for cost, scalability, and reliability in Azure environments
- Integrate Synapse with ADLS Gen2, SQL pools, and downstream analytics tools
Data Orchestration & Dagster (Preferred)
- Design and operate Dagster-based orchestration using asset-centric patterns
- Implement:
o Software-defined data assets
o Sensors and schedules for event-driven and time-based processing
o Dependency-aware pipeline execution
- Monitor and debug pipeline runs, lineage, and failures
- (Preferred) Experience deploying and operating pipelines in Dagster+ Cloud or similar managed environments
Core Data Engineering & Architecture
- Design robust ETL / ELT pipelines aligned to analytics and reporting needs Job Description for [Role]
- Apply solid data modeling principles (fact/dimension models, curated layers, semantic consistency)
- Architect and maintain data lake solutions using ADLS Gen2 and medallion-style layering
- Write and optimize SQL for transformations, validation, and analytics consumption
- Enforce data quality, schema evolution controls, and observability across the platform
Interested candidates, please share your updated CVs at [Confidential Information]