Job Summary
We are looking for an experienced Python PySpark Developer with a strong background in the banking/financial services domain. The ideal candidate will be responsible for building scalable data pipelines, processing large datasets, and supporting data-driven decision-making across risk, compliance, and core banking functions.
Key Responsibilities
- Develop and maintain scalable data pipelines using Python and PySpark
- Work with large datasets in distributed environments (Spark, Hadoop)
- Perform data transformation, cleansing, and aggregation for banking use cases
- Collaborate with data engineers, analysts, and business stakeholders
- Optimize Spark jobs for performance and efficiency
- Integrate data from multiple sources including APIs, databases, and flat files
- Ensure data quality, governance, and compliance with banking standards
- Support batch and real-time data processing workflows
Required Skills
- Strong hands-on experience in Python and PySpark
- Experience working with Apache Spark, Hadoop ecosystem
- Good knowledge of SQL and data modeling
- Experience with ETL pipelines and data engineering concepts
- Familiarity with cloud platforms (AWS / Azure / GCP)
- Understanding of data warehousing concepts
- Experience with version control tools like Git
Preferred Skills
- Experience in Banking / Financial Services domain
- Knowledge of risk, compliance, or regulatory reporting
- Experience with tools like Airflow, Kafka, Databricks
- Exposure to CI/CD pipelines