Design, implement, and optimize data pipelines using Apache Spark and Databricks to ingest, process, and transform large-scale structured and unstructured datasets.
Develop, schedule, and monitor ETL workflows and DAGs using orchestration tools such as Airflow, Jenkins, or similar.
Build and maintain data models, tables, and schemas optimized for analytics and reporting in cloud-based data warehouses.
Write clean, reusable, and modular Python code following best practices and standards.
Use AWS Glue for metadata management, data cataloging, and ETL transformation jobs.
Collaborate with analytics and product teams to translate data needs into engineering solutions.
Ensure data quality, reliability, and observability through unit testing, logging, alerting, and documentation.
Support version control and CI/CD practices for data pipelines.
Participate in code reviews and contribute to team knowledge sharing.