DataBricks - Performance Tuning

ITC Infotech India Limited

Bengaluru

5-8 Years

Save

Posted 12 hours ago
Be among the first 20 applicants

Early Applicant

Quick Apply

Job Description

Position Overview:

We are seeking a highly skilled Senior PySpark Developer with extensive experience in Apache Spark development. The ideal candidate will have a minimum of 5 years of hands-on experience with PySpark and spark demonstrating strong expertise in building and optimizing data processing pipelines for high-performance analytics.

Key Responsibilities

Develop and Optimize: Design, implement, and optimize robust data processing pipelines using PySpark for large-scale data processing tasks.
Data Transformation: Collaborate with data teams to transform and aggregate data from various sources, ensuring data quality and integrity throughout the process.
Performance Tuning: Analyze and tune Spark applications to improve performance and efficiency, employing best practices for Spark configuration and resource management.
Testing and Validation: Conduct comprehensive testing and validation of data workflows, ensuring the accuracy and reliability of processed data.
Technical Collaboration: Work closely with data engineers, data scientists, and business analysts to gather requirements and provide technical solutions tailored to data analytics needs.
Documentation: Maintain clear documentation of data processing workflows, Spark applications, and best practices to support knowledge sharing within the team.

Qualifications

Educational Background: Bachelors or Masters degree in Computer Science, Engineering, or a related field.
Experience: 5+ years of hands-on experience in PySpark and spark development with a strong emphasis on Spark engineering and performance optimization.
Technical Proficiency: In-depth knowledge of Apache Spark, including Spark SQL, DataFrames, RDDs, and Spark Streaming.
Programming Skills: Proficient in Python, with solid experience in data manipulation and ETL processes.
Big Data Frameworks: Familiarity with big data technologies such as Hadoop and data storage solutions (AWS/Azure)
SQL Knowledge: Strong SQL skills for querying and managing data within Spark.
Problem-Solving: Excellent analytical and problem-solving skills with the ability to troubleshoot complex Spark applications.
Domain Knowledge: Prior experience in the finance or investment banking domain is highly desirable.