Senior Data Engineer

Bajaj Finserv

Pune, India

3-5 Years

Save

Posted 14 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Location Name: Pune Corporate Office - Mantri

Job Purpose

Build and maintain reliable, scalable batch and real-time data pipelines on the Enterprise Data Platform to enable analytics, reporting, and downstream applications. The role delivers high-quality data engineering solutions using SQL, Python, and PySpark, with strong focus on streaming, Change Data Capture (CDC), and database mirroring to ensure timely, trusted data delivery.

Duties And Responsibilities

Design, develop, and optimize data pipelines using SQL, Python, and PySpark on cloud data platforms.
Implement and operate real-time/streaming data ingestion (e.g., Spark Structured Streaming/Kafka) including schema evolution and late-arriving data handling.
Set up and manage CDC frameworks and database mirroring for nearrealtime replication and minimal-latency updates.
Build robust data models and curated datasets for analytics, dashboards, and application consumption.
Ensure data quality, lineage, and observability (validation, alerting, SLAs/SLOs) across batch and streaming workloads.
Drive performance tuning and cost optimization (partitioning, file formats, caching, autoscaling).
Harden solutions with security best practices (access controls, PII handling), governance, and compliance standards.
Contribute to CI/CD using Git/GitHub and DevOps pipelines; automate testing and deployments.
Partner with Data Platform, BI/Analytics, and Application teams to translate requirements into technical solutions.
Provide L2/L3 support for pipelines and jobs; troubleshoot incidents, perform RCA, and implement preventive fixes.
Create and maintain technical documentation and runbooks; participate in code reviews and knowledge sharing.

Key Decisions / Dimensions

Select appropriate ingestion patterns (batch vs. streaming), CDC/mirroring approaches, and storage formats.
Define partitioning, indexing, and optimization strategies to meet SLAs.
Recommend tooling and frameworks for orchestration, testing, and observability.
Prioritize defect fixes and enhancements based on impact and risk.

Major Challenges

Maintaining reliability and low latency for missioncritical streaming and CDC pipelines.
Managing schema changes and data drift across diverse source systems.
Balancing feature delivery with production support within tight timelines.
Optimizing performance and cost at scale across environments.

Educational Qualifications

Required Qualifications and Experience

Graduate or PostGraduate in Computer Science, Information Technology, or Data Science/Technologies.

Work Experience

34 years of handson data engineering experience.

Technical Expertise / Skills Keywords

SQL, Python, PySpark
Data streaming (e.g., Spark Structured Streaming, Kafka), CDC (e.g., Debezium/Log-based), Database Mirroring
Data modeling, performance tuning, and optimization
Version control (Git/GitHub) and DevOps pipelines (e.g., Azure DevOps)
Preferred: Azure Databricks, Azure Data Factory, Data Lake Storage; experience with orchestration and observability tools.