Search by job, company or skills

talentgigs

Senior Data Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 12 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Data Engineer – Python & Pyspark (AWS)

Exp: 6+ yrs

Location : Chennai Client Place Guindy

Work Mode : All 5 days

Role Overview

We're seeking a Senior Data Engineer with strong hands-on expertise in Python, PySpark, and AWS to design, build, and operate scalable data pipelines for analytics and operational workloads. You will own end-to-end data engineering — from ingestion to delivery — across batch and near-real-time patterns, with a focus on pipeline reliability, data quality, and cloud-native engineering.

This is an engineering role — you will build and own production systems, not just develop queries or reports. Individual contributor expected to influence architecture, drive technical decisions, and mentor peers without formal people management responsibilities.

Key Responsibilities

  • Design, build, and maintain scalable ETL/ELT pipelines using Python and PySpark to ingest data from files, APIs, RDBMS, NoSQL, and message queues into AWS data stores
  • Develop and optimize PySpark jobs for large-scale data processing, transformation, and aggregation on AWS EMR or Databricks
  • Build and manage AWS-native data workflows using S3, Glue, Lambda, Redshift, and Step Functions; manage scheduling, dependencies, retries, and SLAs
  • Write production-grade Python for data transformations, validations, orchestration, and API integrations — not notebook-level scripting
  • Implement robust data quality controls: schema validation, deduplication, referential integrity checks, and anomaly detection
  • Collaborate on data modeling (star/snowflake, data vault, medallion architecture) and define data contracts and lineage
  • Build CI/CD for data pipelines (tests, linting, packaging) using GitHub Actions or AWS CodePipeline; maintain documentation and runbooks
  • Troubleshoot and remediate production issues; drive continuous improvement in performance, reliability, and cost
  • Required Qualifications

    • 6–8 years of hands-on data engineering experience with a strong focus on Python, PySpark, and AWS
    • Production-grade Python skills — modular code, logging, error handling, REST/GraphQL API integration; not scripting or notebook-only work
    • Strong PySpark expertise — job optimization, partitioning, caching, broadcast joins, performance tuning at scale
    • Hands-on AWS experience: S3, Glue, Lambda, Redshift, EMR, Step Functions, IAM
    • Advanced SQL skills — complex joins, window functions, query optimization; Oracle or PostgreSQL experience is a plus
    • Solid understanding of ETL/ELT patterns, data modeling, and data quality frameworks
    • Experience with workflow orchestration: Apache Airflow, AWS Step Functions, or equivalent
    • CI/CD experience: GitHub Actions, AWS CodePipeline, Azure DevOps, or GitLab CI
    • Strong engineering mindset — you build it, you own it, you operate it

    Nice-to-Have

    • Experience with Kafka, Kinesis, or Event Hubs for event streaming
    • Familiarity with dbt for ELT and data testing
    • Great Expectations or Deequ for data quality
    • Docker / Kubernetes for containerized data services
    • Knowledge of data security patterns: PII masking, tokenization, column/row-level security
    • Informatica, ODI, or SSIS exposure

    Education & Certifications

    • Bachelor's/Master's in Computer Science, Engineering, or equivalent
    • AWS Certified Data Engineer, AWS Solutions Architect, Databricks, or Snowflake certifications are a plus

    More Info

    Job Type:
    Industry:
    Employment Type:

    About Company

    Job ID: 145642797

    Similar Jobs

    Early Applicant