PySpark Scala Developer

5-7 Years

Save

Early Applicant

Job Description

Role : PySpark/Scala Developer

Experience : 5+

Location : Pan India

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

Work with a Leading Bank's Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

Enhance Machine Learning Models using PySpark or Scala
Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all
the way to Production Environment
Participate Feature Engineering, Training Models, Scoring and retraining
Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.
Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

Flexibility in approach and thought process
Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Education Qualification: Master's degree with a specialization in Statistics, Mathematics, Finance or Engineering Degree

Must-Have

5+ years of experience in data engineering, with strong focus on PySpark/python for big data processing.
Expertise in building data pipelines and ingestion frameworks from relational, semi-structured (JSON, XML), and unstructured sources (logs, PDFs).
Proficiency in Python with strong knowledge of data processing libraries.
Strong SQL skills for querying and validating data in platforms like Amazon Redshift, PostgreSQL, or similar.
Experience with distributed computing frameworks (e.g., Spark on EMR, Databricks).
Familiarity with workflow orchestration tools (e.g., AWS Step Functions, or similar).
Solid understanding of data lake / data warehouse architectures and data modeling basics.

Good-to-Have

Familiarity with Delta Lake or similar for large-scale data storage.
Exposure to real-time streaming frameworks (e.g., Spark Structured Streaming, Kafka).
Knowledge of data governance, lineage, and cataloging tools (e.g., AWS Glue Catalog, Apache Atlas).
Understanding of DevOps/CI-CD pipelines for data projects using Git, Jenkins, or similar tools.

Responsibility of / Expectations from the Role

Design and build robust, scalable ETL/ELT pipelines using PySpark to ingest data from diverse sources (databases, logs, APIs, files).
Transform and curate raw transactional and log data into analysis-ready datasets in the Data Hub and analytical data marts.
Develop reusable and parameterized Spark jobs for batch and micro-batch processing.
Optimize performance and scalability of PySpark jobs across large data volumes.
Ensure data quality, consistency, lineage, and proper documentation across ingestion flows.
Collaborate with Data Architects, Modelers, and Data Scientists to implement ingestion logic aligned with business needs.
Work with cloud-based data platforms (e.g., AWS S3, Glue, EMR, Redshift) for data movement and storage.
Support version control, CI/CD, and infrastructure-as-code where applicable.