Job Summary
We are looking for a Senior PySpark Developer with 3 to 6 years of experience in building and optimizing data pipelines using PySpark on Databricks, within AWS cloud environments. This role focuses on the modernization of legacy domains, involving integration with systems like Kafka and collaboration across cross-functional teams.
Key Responsibilities
- Develop and optimize scalable PySpark applications on Databricks.
- Work with AWS services (S3, EMy, Lambda, Dlue) for cloud-native data processing.
- Integrate streaming and batch data sources, especially using Kafka.
- Tune Spark jobs for performance, memory, and compute efficiency.
- Collaborate with DevOps, product, and analytics teams on delivery and deployment.
- Ensure data governance, lineage, and quality compliance across all pipelines.
Required Skills
- 36 years of hands-on development in PySpark.
- Experience with Databricks and performance tuning using Spark UI.
- Strong understanding of AWS services, Kafka, and distributed data processing.
- Proficient in partitioning, caching, join optimization, and resource configuration.
- Familiarity with data formats like Parquet, Avro, and OyC.
- Exposure to orchestration tools (Airflow, Databricks Workflows).
- Scala experience is a strong plus.