Location Name: Pune Corporate Office - Mantri
Job Purpose
Build and maintain reliable, scalable batch and real-time data pipelines on the Enterprise Data Platform to enable analytics, reporting, and downstream applications. The role delivers high-quality data engineering solutions using SQL, Python, and PySpark, with strong focus on streaming, Change Data Capture (CDC), and database mirroring to ensure timely, trusted data delivery.
Duties And Responsibilities
- Design, develop, and optimize data pipelines using SQL, Python, and PySpark on cloud data platforms.
- Implement and operate real-time/streaming data ingestion (e.g., Spark Structured Streaming/Kafka) including schema evolution and late-arriving data handling.
- Set up and manage CDC frameworks and database mirroring for nearrealtime replication and minimal-latency updates.
- Build robust data models and curated datasets for analytics, dashboards, and application consumption.
- Ensure data quality, lineage, and observability (validation, alerting, SLAs/SLOs) across batch and streaming workloads.
- Drive performance tuning and cost optimization (partitioning, file formats, caching, autoscaling).
- Harden solutions with security best practices (access controls, PII handling), governance, and compliance standards.
- Contribute to CI/CD using Git/GitHub and DevOps pipelines; automate testing and deployments.
- Partner with Data Platform, BI/Analytics, and Application teams to translate requirements into technical solutions.
- Provide L2/L3 support for pipelines and jobs; troubleshoot incidents, perform RCA, and implement preventive fixes.
- Create and maintain technical documentation and runbooks; participate in code reviews and knowledge sharing.
Key Decisions / Dimensions
- Select appropriate ingestion patterns (batch vs. streaming), CDC/mirroring approaches, and storage formats.
- Define partitioning, indexing, and optimization strategies to meet SLAs.
- Recommend tooling and frameworks for orchestration, testing, and observability.
- Prioritize defect fixes and enhancements based on impact and risk.
Major Challenges
- Maintaining reliability and low latency for missioncritical streaming and CDC pipelines.
- Managing schema changes and data drift across diverse source systems.
- Balancing feature delivery with production support within tight timelines.
- Optimizing performance and cost at scale across environments.
Educational Qualifications
Required Qualifications and Experience
- Graduate or PostGraduate in Computer Science, Information Technology, or Data Science/Technologies.
Work Experience
- 34 years of handson data engineering experience.
Technical Expertise / Skills Keywords
- SQL, Python, PySpark
- Data streaming (e.g., Spark Structured Streaming, Kafka), CDC (e.g., Debezium/Log-based), Database Mirroring
- Data modeling, performance tuning, and optimization
- Version control (Git/GitHub) and DevOps pipelines (e.g., Azure DevOps)
- Preferred: Azure Databricks, Azure Data Factory, Data Lake Storage; experience with orchestration and observability tools.