Pyspark Data Engineer

zorba ai

Pune, India

Fresher

Save

Posted 2 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Overview

We are looking for a skilled Data Engineer to join our team and collaborate with cross-functional engineering, data science, and product teams. In this role, you will design, build, and optimize scalable data pipelines across both batch and streaming systems.

You will play a critical role in delivering high-quality, high-performance data products that power analytics, machine learning, personalization, and real-time business operations. The role also focuses on modernizing data platforms, improving reliability, and maintaining strong data quality standards.

Key Responsibilities

Design, develop, and maintain scalable and reliable data pipelines for data ingestion, transformation, and integration
Build and optimize batch data processing workflows using PySpark and SQL
Support and enhance real-time/streaming pipelines using Kafka or similar technologies
Improve pipeline performance, scalability, and cost efficiency across large datasets
Implement automated data quality checks, validation frameworks, and regression testing
Create and review architectural designs and ensure alignment with engineering standards
Collaborate with data scientists, product managers, and engineering teams to deliver production-ready solutions
Monitor, troubleshoot, and resolve data pipeline issues in production and non-production environments

Required Skills

Data Processing: Strong experience with PySpark, SQL, Spark architecture, and performance tuning
Programming: Python (preferred)
Cloud Platforms: Databricks, Microsoft Azure
Streaming: Kafka or similar (nice to have)
Version Control & CI/CD: Git, GitHub, GitHub Actions, CI/CD practices
Collaboration Tools: JIRA, Confluence, MS Teams

Preferred Qualifications

Strong understanding of distributed systems and modern data architecture patterns
Experience with data modeling and scalable data design
Ability to write clean, maintainable, and testable code
Hands-on experience with data quality frameworks and testing strategies
Proven ability to troubleshoot and resolve complex data issues
Experience working in Agile/Scrum environments
Strong communication skills with the ability to explain technical concepts and trade-offs

Key Traits