We are looking for an experiencedAirflow Orchestration and Ingestion Engineerto support the migration of legacy data workflows from Apache Oozie to Apache Airflow, as part of a broader transition from an on-prem Hadoop environment to a modern cloud-based data platform onDatabricks and AWS. This is a critical role focused on reengineering data pipeline orchestration, automation, and deployment within a cloud-native framework.
Key Responsibilities:
Workflow Migration:
- Convert Oozie workflows to Airflow DAGs using Python.
- Build reusable, modular Airflow pipelines for ingestion, transformation, and scheduling.
- Ensure accurate migration and one-to-one workflow alignment without disrupting business processes.
Cloud Data Platform Transition:
- Work with engineering teams to migrate Hadoop workloads to Databricks on AWS.
- Leverage Airflow to orchestrate data pipelines across AWS services (S3, EMR, Glue, Redshift).
Pipeline Optimization:
- Enhance pipeline performance for throughput and latency in the AWS ecosystem.
- Integrate Airflow with Databricks for transformation and analytics tasks.
Monitoring & Error Handling:
- Implement retry logic, exception handling, and alerting in Airflow.
- Set up observability tools like CloudWatch, Prometheus, or Airflow s native monitoring.
Collaboration & Documentation:
- Collaborate with data architects, DevOps, and cloud teams.
- Document orchestration logic, best practices, and the migration process.
CI/CD and Infrastructure Automation:
- Develop CI/CD pipelines using Jenkins and Terraform.
- Automate DAG deployment and infrastructure provisioning via IaC.
- Integrate validation steps in deployment workflows.
Required Skills and Experience:
- Strong expertise inApache Airflow, including complex DAG design and orchestration.
- Prior experience withApache Oozieand workflow migration.
- Proficiency inPythonfor Airflow DAG development.
- Hands-on experience withHadoop ecosystems(e.g., HDFS, Hive, Spark).
- Knowledge ofCI/CD toolssuch as Jenkins andInfrastructure as Code (IaC)with Terraform.
- Experience withDatabricks(preferably on AWS) and big data orchestration.
- Solid understanding ofAWS services: S3, EMR, Glue, Lambda, Redshift, IAM.
- Familiarity with container tools such asDockerorKubernetes.
Preferred Qualifications:
- Experience withlarge-scale cloud migrations, especially Hadoop-to-Databricks.
- Proficiency inSpark / PySparkfor big data transformation.
- AWS or Databricks certifications are a plus.
- Familiarity with Git and workflow monitoring platforms.