Search by job, company or skills

zorba ai

Pyspark Data Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Overview

We are looking for a skilled Data Engineer to join our team and collaborate with cross-functional engineering, data science, and product teams. In this role, you will design, build, and optimize scalable data pipelines across both batch and streaming systems.

You will play a critical role in delivering high-quality, high-performance data products that power analytics, machine learning, personalization, and real-time business operations. The role also focuses on modernizing data platforms, improving reliability, and maintaining strong data quality standards.

Key Responsibilities

  • Design, develop, and maintain scalable and reliable data pipelines for data ingestion, transformation, and integration
  • Build and optimize batch data processing workflows using PySpark and SQL
  • Support and enhance real-time/streaming pipelines using Kafka or similar technologies
  • Improve pipeline performance, scalability, and cost efficiency across large datasets
  • Implement automated data quality checks, validation frameworks, and regression testing
  • Create and review architectural designs and ensure alignment with engineering standards
  • Collaborate with data scientists, product managers, and engineering teams to deliver production-ready solutions
  • Monitor, troubleshoot, and resolve data pipeline issues in production and non-production environments

Required Skills

  • Data Processing: Strong experience with PySpark, SQL, Spark architecture, and performance tuning
  • Programming: Python (preferred)
  • Cloud Platforms: Databricks, Microsoft Azure
  • Streaming: Kafka or similar (nice to have)
  • Version Control & CI/CD: Git, GitHub, GitHub Actions, CI/CD practices
  • Collaboration Tools: JIRA, Confluence, MS Teams

Preferred Qualifications

  • Strong understanding of distributed systems and modern data architecture patterns
  • Experience with data modeling and scalable data design
  • Ability to write clean, maintainable, and testable code
  • Hands-on experience with data quality frameworks and testing strategies
  • Proven ability to troubleshoot and resolve complex data issues
  • Experience working in Agile/Scrum environments
  • Strong communication skills with the ability to explain technical concepts and trade-offs

Key Traits

  • Proactive in identifying improvements and reducing technical debt
  • Strong ownership and accountability mindset
  • Collaborative team player with cross-functional exposure
  • Detail-oriented with a focus on data quality and reliability

Nice to Have

  • Experience with real-time data processing use cases
  • Exposure to machine learning data pipelines or feature engineering
  • Knowledge of cost optimization strategies in cloud data platforms

Skills: pyspark,databricks,azure

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 146982175

Similar Jobs

Early Applicant

MACHINE LEARNING INTERNSHIP AT PUNE

**********Company Name Confidential