About the Role
We are looking for a Data Engineer with strong experience in Spark (PySpark), SQL, and data pipeline architecture. You will play a critical role in designing, building, and optimizing data workflows that enable scalable analytics and real-time insights. The ideal candidate is hands-on, detail-oriented, and passionate about crafting reliable data solutions, while collaborating with cross-functional teams.
Responsibilities
- Design and architect scalable and efficient data pipelines for batch and real-time processing.
- Develop and optimize solutions using Spark (PySpark) and SQL.
- Ensure data pipelines are reliable, maintainable, and well-tested.
- Work with stakeholders to understand business requirements and translate them into data-driven solutions.
- Collaborate with cross-functional teams to ensure data quality, availability, and performance.
- Stay updated with emerging data engineering tools and practices.
Must-Have Skills
- Strong expertise in Spark (PySpark).
- Proficiency in SQL (query optimization, performance tuning, complex joins).
- Hands-on experience in designing and architecting data pipelines.
- Excellent communication and collaboration skills.
Good to Have
- Experience with data streaming platforms (Kafka, Kinesis, etc.).
- Proficiency in Python for data engineering tasks.
- Exposure to Databricks and Azure cloud services.
- Knowledge of HTAP systems, Debezium, 3PL/logistics domain(having most of this mentioned skills would be great)
- Familiarity with orchestration frameworks such as Apache Airflow or Apache NiFi.