
Search by job, company or skills
Key Responsibilities:
Design, build and maintain scalable batch and (optionally) streaming data pipelines using Apache Spark 3.x, Scala, Python & SQL
Implement core ETL/ELT logic in Scala and Python; author efficient Spark DataFrame/Dataset jobs.
Write and optimize complex SQL for ingestion, transformation and consumption layers.
Tune Spark jobs for performance and cost: partitioning, join strategies, broadcast, memory/tuning, shuffle reduction.
Ensure code quality via unit tests, integration tests, CI/CD and code reviews.
Work with data modeling, schema evolution and data quality checks to ensure reliable outputs.
Collaborate with platform/DevOps teams to deploy and monitor pipelines (retries, logging, alerting).
Troubleshoot production issues and perform root-cause analysis.
Mentor and guide junior engineers, share best practices and drive improvements to the data platform.
Qualifications:
58 years industry experience as a Data Engineer.
Strong hands-on experience with Apache Spark 3.x (development, tuning, debugging).
Proficient in Scala programming for Spark ETL/ELT development (datasets, RDDs when required).
Experience with streaming frameworks like Spark Structured Streaming, Kafka.
Proficient in Python programming.
Advanced SQL skills (complex queries, window functions, CTEs, query optimization).
Experience building production-grade ETL/ELT workflows, testing and monitoring them.
Experience with orchestration tools like Apache Airflow
Strong software engineering fundamentals: version control (Git), modular code, code reviews, CI/CD pipelines.
Experience with cloud object storage and common data formats (Parquet/ORC/Avro).
Strong problem solving, communication and collaboration skills.
Experience with AWS Glue, EMR, Athena, Redshift, S3 will be a plus.
Job ID: 132698599