Senior Data Engineer

acuity analytics

Pune, India

6-8 Years

Save

Posted 20 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

We are seeking a seasoned Data Engineer to design, build, and operate scalable, reliable, production‑grade data pipelines on Azure. This role requires deep, hands‑on expertise in PySpark, Python, Azure Synapse, and modern data orchestration frameworks (Dagster preferred). The role demands a strong understanding of distributed data processing, cloud-based data lake architectures, and end‑to‑end pipeline ownership, spanning raw data ingestion to analytics‑ready datasets.

Immediate joiners highly preferred (0–30 days) || Location: Pune/ Gurgaon (Hybrid: 2–3 days WFO)

Key competencies:

6+ years of experience in CORE DATA ENGINEERING.

Core Technical Skills

Strong, hands-on PySpark experience in production environments
Advanced Python skills for data engineering (beyond notebooks and scripts)
Practical experience with Azure Synapse Analytics (Pipelines + Spark)
Solid understanding of distributed systems and cloud data processing

Data Engineering Foundations

Strong grasp of ETL / ELT design patterns
Experience with large-scale data lakes (ADLS Gen2 or equivalent)
Excellent SQL skills for transformation and analytics workflows
Experience handling structured and semi-structured data

Key Responsibilities:

Distributed Data Engineering (PySpark)

Design and build distributed data pipelines using PySpark on large-scale datasets
Develop and optimize pipelines using:

o Spark DataFrames and Spark SQL

o Partitioning, caching, and join optimization strategies

o Efficient read/write patterns for cloud object storage

Diagnose and remediate performance bottlenecks including skew, shuffle issues, and memory pressure
Ensure pipelines meet SLAs for latency, reliability, and data quality

Python Engineering for Data Platforms

Write production-grade Python code for data pipelines, transformations, and utilities
Apply strong engineering practices:

o Modular code structure and reusable libraries

o Dependency and package management

o Logging, error handling, and testability

Build framework-agnostic pipeline logic where appropriate (decoupled from orchestration layer)

Azure Synapse & Cloud Data Integration

Hands-on development with Azure Synapse Analytics, including:

o Synapse Pipelines (data movement and orchestration)

o Synapse Notebooks (Spark-based transformations)

o Linked Services and Integration Runtimes

Participate in or lead data platform migrations (on-prem → Azure, legacy ETL → Synapse / Spark)
Optimize pipeline execution for cost, scalability, and reliability in Azure environments
Integrate Synapse with ADLS Gen2, SQL pools, and downstream analytics tools

Data Orchestration & Dagster (Preferred)

Design and operate Dagster-based orchestration using asset-centric patterns
Implement:

o Software-defined data assets

o Sensors and schedules for event-driven and time-based processing

o Dependency-aware pipeline execution

Monitor and debug pipeline runs, lineage, and failures
(Preferred) Experience deploying and operating pipelines in Dagster+ Cloud or similar managed environments

Core Data Engineering & Architecture

Design robust ETL / ELT pipelines aligned to analytics and reporting needs Job Description for [Role]
Apply solid data modeling principles (fact/dimension models, curated layers, semantic consistency)
Architect and maintain data lake solutions using ADLS Gen2 and medallion-style layering
Write and optimize SQL for transformations, validation, and analytics consumption
Enforce data quality, schema evolution controls, and observability across the platform

Interested candidates, please share your updated CVs at [Confidential Information]