Search by job, company or skills

A

PySpark Developer

6-8 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago
  • Be among the first 10 applicants
Early Applicant
Quick Apply

Job Description

Key Skills & Responsibilities

  • Strong expertise in PySpark and Apache Spark for batch and real-time data processing.
  • Experience in designing and implementing ETL pipelines, including data ingestion, transformation, and validation.
  • Proficiency in Python for scripting, automation, and building reusable components.
  • Hands-on experience with scheduling tools like Airflow or Control-M to orchestrate workflows.
  • Familiarity with AWS ecosystem, especially S3 and related file system operations.
  • Strong understanding of Unix/Linux environments and Shell scripting.
  • Experience with Hadoop, Hive, and platforms like Cloudera or Hortonworks.
  • Ability to handle CDC (Change Data Capture) operations on large datasets.
  • Experience in performance tuning, optimizing Spark jobs, and troubleshooting.
  • Strong knowledge of data modeling, data validation, and writing unit test cases.
  • Exposure to real-time and batch integration with downstream/upstream systems.
  • Working knowledge of Jupyter Notebook, Zeppelin, or PyCharm for development and debugging.
  • Understanding of Agile methodologies, with experience in CI/CD tools (e.g., Jenkins, Git).

Preferred Skills

  • Experience in building or integrating APIs for data provisioning.
  • Exposure to ETL or reporting tools such as Informatica, Tableau, Jasper, or QlikView.
  • Familiarity with AI/ML model development using PySpark in cloud environments
  • Skills: ci/cd,zeppelin,pycharm,pyspark,etl tools,control-m,unit test cases,tableau,performance tuning,jenkins,qlikview,informatica,jupyter notebook,api integration,unix/linux,git,aws s3,hive,cloudera,jasper,airflow,cdc,pyspark, apache spark, python, aws s3, airflow/control-m, sql, unix/linux, hive, hadoop, data modeling, and performance tuning,agile methodologies,aws,s3,data modeling,data validation,ai/ml model development,batch integration,apache spark,python,etl pipelines,shell scripting,hortonworks,real-time integration,hadoop.
  • Mandatory Key Skills - Apache Spark,Python,ETL,Unix,Linux,data engineering,Agile methodologies,CI/CD,data modeling,data validation,PySpark.

More Info

Job Type:
Industry:
Employment Type:
Open to candidates from:
Indian

About Company

We are team of experienced technology solutions with decades of industry experience. With our in-depth understanding of business lifecycle and digital technologies, we successfully collaborate with our client to fulfill their talent needs. With deep focus for digital product development skills to Cloud, Data, DevOps and Automation, our team is comprised of technical talent hunters.

Job ID: 129073823

Similar Jobs