JOB OVERVIEW
We are seeking a Data Engineer who can work closely with cross-functional engineering, data science, and product teams to design, build, and enhance scalable data pipelines across batch and streaming systems. This role is responsible for maintaining high-quality, high-performance data products that support analytics, machine learning, personalization, and real-time business operations.
The Data Engineer will contribute to quarterly business and technical objectives by modernizing core data assets, improving operational reliability, and ensuring best-in-class data quality standards.
KEY RESPONSIBILITIES
- Design, build, and maintain scalable, efficient, and reliable data pipelines for ingestion, transformation, and integration across diverse data sources and destinations.
- Develop and optimize batch workflows (PySpark, SQL, orchestration) and support real-time/streaming pipelines (Kafka or similar) when applicable.
- Improve pipeline performance, cost efficiency, and scalability across large and complex datasets.
- Implement and maintain automated data quality checks, regression testing, and validation frameworks to ensure accuracy, reliability, and compliance with organizational standards
- Draft and review architectural diagrams, support processes to ensure clarity and alignment across engineering teams
- Work closely with engineering, product, and data science partners to deliver high-quality, production-ready data solutions
REQUIRED SKILLS
- Data Processing: PySpark, SQL, Spark engine/architecture knowledge, performance tuning
- Programming: Python (Good to have)
- Cloud & Platforms: Databricks, Azure
- Streaming: Kafka or similar streaming platforms (nice to have)
- Version Control / CI-CD: Git, GitHub, GitHub Actions, CI/CD best practices
- Collaboration: JIRA, Confluence, MS Teams
PREFERRED TRAITS
- Understanding modern software design patterns, data modeling practices, and distributed system fundamentals.
- Ability to write clean, testable, scalable code following engineering best practices.
- Follows established architecture patterns and engineering standards across the data platform.
- Proactively suggests improvements to minimize tech debt and increase reliability.
- Capable of identifying, triaging, and resolving data defects in both production and non-prod environments.
- Partners with QA/Automation teams to implement functional and data testing strategies.
- Experience working in Agile environments with cross-functional engineering teams of 5 or more.
- Collaborate effectively with outside teams to support product adoption and operational stability.
- Demonstrates strong technical communication skills: can explain trade-offs, ask the right questions, and provide/receive feedback constructively.
PREFERRED TRAITS
- Understanding modern software design patterns, data modeling practices, and distributed system fundamentals.
- Ability to write clean, testable, scalable code following engineering best practices.
- Follows established architecture patterns and engineering standards across the data platform.
- Proactively suggests improvements to minimize tech debt and increase reliability.
- Capable of identifying, triaging, and resolving data defects in both production and non-prod environments.
- Partners with QA/Automation teams to implement functional and data testing strategies.
- Experience working in Agile environments with cross-functional engineering teams of 5 or more.
- Collaborate effectively with outside teams to support product adoption and operational stability.
- Demonstrates strong technical communication skills: can explain trade-offs, ask the right questions, and provide/receive feedback constructively.
Skills: spark,pyspark,sql