Job Title: Senior Data Engineer (Databricks Certified)
Experience: 8+ Years
Location: Chennai
Work Mode: Onsite
Job Summary:
We are seeking a highly skilled Senior Data Engineer with 8+ years of experience in building scalable data platforms and pipelines. The ideal candidate must be Databricks Certified and possess strong expertise in Spark Streaming, distributed data processing, and cloud-based data engineering frameworks. This role involves designing, developing, and optimizing modern data solutions that support real-time and batch workloads.
Key Responsibilities:
- Design and develop scalable data pipelines using Apache Spark, PySpark, and Spark Streaming.
- Build and optimize complex data workflows on Azure Databricks / AWS Databricks.
- Implement real-time data streaming solutions using Structured Streaming or Delta Live Tables (DLT).
- Work with Delta Lake, data lakehouse architectures, and medallion frameworks.
- Develop ETL/ELT pipelines integrating structured, semi-structured, and unstructured data.
- Collaborate with data architects and analysts to design robust data models and transformations.
- Optimize Spark jobs for performance, reliability, and cost efficiency.
- Use CI/CD practices for data pipeline deployments (Azure DevOps / GitHub Actions / Jenkins).
- Work with cloud storage and compute services: ADLS, S3, Azure Synapse, Glue, Data Factory (based on cloud).
- Ensure data quality, governance, and security standards throughout the pipeline.
Required Skills & Qualifications:
- 8+ years of experience as a Data Engineer in large-scale environments.
- Databricks Certified (Associate / Professional Data Engineer, or Spark Developer Certification).
- Strong hands-on experience with Spark Streaming and real-time processing.
- Expert in PySpark, SQL, Delta Lake, and distributed data processing.
- Proficiency in one major cloud platform: Azure / AWS / GCP.
- Strong experience with ETL/ELT design, performance tuning, and pipeline orchestration.
- Experience with data modeling, partitioning strategies, and big data storage formats (Parquet, ORC, Avro).
- Solid understanding of DevOps, version control (Git), and CI/CD workflows.
Nice to Have:
- Experience with Databricks Unity Catalog, governance, and lineage.
- Knowledge of Kafka / Event Hub / Kinesis for streaming ingestion.
- Background in Airflow, dbt, or similar orchestration tools.
- Knowledge of machine learning pipelines and MLOps concepts.