Senior Data Engineer

talentgigs

Chennai, India

6-8 Years

Save

Posted 12 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Data Engineer – Python & Pyspark (AWS)

Exp: 6+ yrs

Location : Chennai Client Place Guindy

Work Mode : All 5 days

Role Overview

We're seeking a Senior Data Engineer with strong hands-on expertise in Python, PySpark, and AWS to design, build, and operate scalable data pipelines for analytics and operational workloads. You will own end-to-end data engineering — from ingestion to delivery — across batch and near-real-time patterns, with a focus on pipeline reliability, data quality, and cloud-native engineering.

This is an engineering role — you will build and own production systems, not just develop queries or reports. Individual contributor expected to influence architecture, drive technical decisions, and mentor peers without formal people management responsibilities.

Key Responsibilities

Design, build, and maintain scalable ETL/ELT pipelines using Python and PySpark to ingest data from files, APIs, RDBMS, NoSQL, and message queues into AWS data stores
Develop and optimize PySpark jobs for large-scale data processing, transformation, and aggregation on AWS EMR or Databricks
Build and manage AWS-native data workflows using S3, Glue, Lambda, Redshift, and Step Functions; manage scheduling, dependencies, retries, and SLAs
Write production-grade Python for data transformations, validations, orchestration, and API integrations — not notebook-level scripting
Implement robust data quality controls: schema validation, deduplication, referential integrity checks, and anomaly detection
Collaborate on data modeling (star/snowflake, data vault, medallion architecture) and define data contracts and lineage
Build CI/CD for data pipelines (tests, linting, packaging) using GitHub Actions or AWS CodePipeline; maintain documentation and runbooks
Troubleshoot and remediate production issues; drive continuous improvement in performance, reliability, and cost

Required Qualifications

6–8 years of hands-on data engineering experience with a strong focus on Python, PySpark, and AWS
Production-grade Python skills — modular code, logging, error handling, REST/GraphQL API integration; not scripting or notebook-only work
Strong PySpark expertise — job optimization, partitioning, caching, broadcast joins, performance tuning at scale
Hands-on AWS experience: S3, Glue, Lambda, Redshift, EMR, Step Functions, IAM
Advanced SQL skills — complex joins, window functions, query optimization; Oracle or PostgreSQL experience is a plus
Solid understanding of ETL/ELT patterns, data modeling, and data quality frameworks
Experience with workflow orchestration: Apache Airflow, AWS Step Functions, or equivalent
CI/CD experience: GitHub Actions, AWS CodePipeline, Azure DevOps, or GitLab CI
Strong engineering mindset — you build it, you own it, you operate it

Nice-to-Have

Experience with Kafka, Kinesis, or Event Hubs for event streaming
Familiarity with dbt for ELT and data testing
Great Expectations or Deequ for data quality
Docker / Kubernetes for containerized data services
Knowledge of data security patterns: PII masking, tokenization, column/row-level security
Informatica, ODI, or SSIS exposure

Education & Certifications

Bachelor's/Master's in Computer Science, Engineering, or equivalent
AWS Certified Data Engineer, AWS Solutions Architect, Databricks, or Snowflake certifications are a plus