About the Role
As a Data Engineer II at Baazi, you will contribute to building and optimizing scalable data pipelines and lakehouse systems that power analytics and product insights across the organization. You'll work hands-on across our AWS-based ecosystem, develop robust data workflows, and ensure high standards of data quality, performance, and reliability.
Key Responsibilities
- Build and optimize scalable data pipelines and lakehouse components using Iceberg or Hudi.
- Develop ETL/ELT workflows on AWS using Glue, EMR, Lambda, Redshift, and other platform services.
- Write clean, modular, reusable code using PySpark, Python, and SQL.
- Manage and enhance orchestration workflows with Airflow to ensure reliability and scalability.
- Collaborate with analytics, product, and engineering teams to maintain unified and consistent data models.
- Participate in performance tuning, cost optimization, and improvement of AWS data infrastructure.
- Implement and follow best practices in data quality, cataloging, and metadata management.
- Contribute to code reviews and engineering discussions to maintain high technical standards.
Required Skills & Experience
- 24 years of experience in data engineering with strong exposure to large-scale data systems.
- 2+ years of hands-on experience in PySpark.
- Solid understanding of the AWS data ecosystem: Glue, EMR, S3, Lambda, Redshift, CloudWatch.
- Practical experience working with Apache Iceberg or Hudi (Iceberg preferred).
- Strong programming skills in Python, PySpark, and solid command over SQL.
- Experience working with Airflow for scheduling and orchestration.
- Understanding of distributed systems, data modeling, and data governance principles.
- Exposure to containerized environments (Kubernetes, Docker) is a plus.
- Ability to work closely with business and technical teams to deliver scalable solutions.