PySpark Developer

algoleap

Hyderabad, India

3-6 Years

This job is no longer accepting applications

Posted 2 months ago

Job Description

Job Summary

We are looking for a Senior PySpark Developer with 3 to 6 years of experience in building and optimizing data pipelines using PySpark on Databricks, within AWS cloud environments. This role focuses on the modernization of legacy domains, involving integration with systems like Kafka and collaboration across cross-functional teams.

Key Responsibilities

Develop and optimize scalable PySpark applications on Databricks.
Work with AWS services (S3, EMy, Lambda, Dlue) for cloud-native data processing.
Integrate streaming and batch data sources, especially using Kafka.
Tune Spark jobs for performance, memory, and compute efficiency.
Collaborate with DevOps, product, and analytics teams on delivery and deployment.
Ensure data governance, lineage, and quality compliance across all pipelines.

Required Skills

36 years of hands-on development in PySpark.
Experience with Databricks and performance tuning using Spark UI.
Strong understanding of AWS services, Kafka, and distributed data processing.
Proficient in partitioning, caching, join optimization, and resource configuration.
Familiarity with data formats like Parquet, Avro, and OyC.
Exposure to orchestration tools (Airflow, Databricks Workflows).
Scala experience is a strong plus.