Data Engineer-Pyspark

zorba ai

Chennai, India

Fresher

Save

Posted 13 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

JOB OVERVIEW

We are seeking a Data Engineer who can work closely with cross-functional engineering, data science, and product teams to design, build, and enhance scalable data pipelines across batch and streaming systems. This role is responsible for maintaining high-quality, high-performance data products that support analytics, machine learning, personalization, and real-time business operations.

The Data Engineer will contribute to quarterly business and technical objectives by modernizing core data assets, improving operational reliability, and ensuring best-in-class data quality standards.

KEY RESPONSIBILITIES

Design, build, and maintain scalable, efficient, and reliable data pipelines for ingestion, transformation, and integration across diverse data sources and destinations.
Develop and optimize batch workflows (PySpark, SQL, orchestration) and support real-time/streaming pipelines (Kafka or similar) when applicable.
Improve pipeline performance, cost efficiency, and scalability across large and complex datasets.
Implement and maintain automated data quality checks, regression testing, and validation frameworks to ensure accuracy, reliability, and compliance with organizational standards
Draft and review architectural diagrams, support processes to ensure clarity and alignment across engineering teams
Work closely with engineering, product, and data science partners to deliver high-quality, production-ready data solutions

REQUIRED SKILLS

Data Processing: PySpark, SQL, Spark engine/architecture knowledge, performance tuning
Programming: Python (Good to have)
Cloud & Platforms: Databricks, Azure
Streaming: Kafka or similar streaming platforms (nice to have)
Version Control / CI-CD: Git, GitHub, GitHub Actions, CI/CD best practices
Collaboration: JIRA, Confluence, MS Teams

PREFERRED TRAITS

Understanding modern software design patterns, data modeling practices, and distributed system fundamentals.
Ability to write clean, testable, scalable code following engineering best practices.
Follows established architecture patterns and engineering standards across the data platform.
Proactively suggests improvements to minimize tech debt and increase reliability.
Capable of identifying, triaging, and resolving data defects in both production and non-prod environments.
Partners with QA/Automation teams to implement functional and data testing strategies.
Experience working in Agile environments with cross-functional engineering teams of 5 or more.
Collaborate effectively with outside teams to support product adoption and operational stability.
Demonstrates strong technical communication skills: can explain trade-offs, ask the right questions, and provide/receive feedback constructively.

PREFERRED TRAITS

Understanding modern software design patterns, data modeling practices, and distributed system fundamentals.
Ability to write clean, testable, scalable code following engineering best practices.
Follows established architecture patterns and engineering standards across the data platform.
Proactively suggests improvements to minimize tech debt and increase reliability.
Capable of identifying, triaging, and resolving data defects in both production and non-prod environments.
Partners with QA/Automation teams to implement functional and data testing strategies.
Experience working in Agile environments with cross-functional engineering teams of 5 or more.
Collaborate effectively with outside teams to support product adoption and operational stability.
Demonstrates strong technical communication skills: can explain trade-offs, ask the right questions, and provide/receive feedback constructively.

Skills: spark,pyspark,sql