Job Overview
We are looking for a skilled Data Engineer to join our team and collaborate with cross-functional engineering, data science, and product teams. In this role, you will design, build, and optimize scalable data pipelines across both batch and streaming systems.
You will play a critical role in delivering high-quality, high-performance data products that power analytics, machine learning, personalization, and real-time business operations. The role also focuses on modernizing data platforms, improving reliability, and maintaining strong data quality standards.
Key Responsibilities
- Design, develop, and maintain scalable and reliable data pipelines for data ingestion, transformation, and integration
- Build and optimize batch data processing workflows using PySpark and SQL
- Support and enhance real-time/streaming pipelines using Kafka or similar technologies
- Improve pipeline performance, scalability, and cost efficiency across large datasets
- Implement automated data quality checks, validation frameworks, and regression testing
- Create and review architectural designs and ensure alignment with engineering standards
- Collaborate with data scientists, product managers, and engineering teams to deliver production-ready solutions
- Monitor, troubleshoot, and resolve data pipeline issues in production and non-production environments
Required Skills
- Data Processing: Strong experience with PySpark, SQL, Spark architecture, and performance tuning
- Programming: Python (preferred)
- Cloud Platforms: Databricks, Microsoft Azure
- Streaming: Kafka or similar (nice to have)
- Version Control & CI/CD: Git, GitHub, GitHub Actions, CI/CD practices
- Collaboration Tools: JIRA, Confluence, MS Teams
Preferred Qualifications
- Strong understanding of distributed systems and modern data architecture patterns
- Experience with data modeling and scalable data design
- Ability to write clean, maintainable, and testable code
- Hands-on experience with data quality frameworks and testing strategies
- Proven ability to troubleshoot and resolve complex data issues
- Experience working in Agile/Scrum environments
- Strong communication skills with the ability to explain technical concepts and trade-offs
Key Traits
- Proactive in identifying improvements and reducing technical debt
- Strong ownership and accountability mindset
- Collaborative team player with cross-functional exposure
- Detail-oriented with a focus on data quality and reliability
Nice to Have
- Experience with real-time data processing use cases
- Exposure to machine learning data pipelines or feature engineering
- Knowledge of cost optimization strategies in cloud data platforms
Skills: pyspark,databricks,azure