Job Summary
We are seeking a highly skilled and experienced Data Engineer to lead the design, development, and maintenance of scalable ETL pipelines. The ideal candidate will have deep expertise in the Azure data ecosystem, specifically leveraging Python, Databricks, and Azure Data Factory (ADF) to transform raw data into actionable insights. You will play a key role in optimizing data architecture and mentoring junior developers.
Key Responsibilities
- ETL Pipeline Development: Design, build, and orchestrate robust ETL/ELT pipelines using Azure Data Factory (ADF) and Azure Databricks to ingest data from various on-premise and cloud sources.
- Data Transformation: Utilize Python (PySpark) and SQL to perform complex data transformations, cleaning, and validation within Databricks notebooks.
- Architecture & Optimization: Collaborate with architects to define data models and optimize pipeline performance (latency, throughput, and cost) for large-scale datasets.
- Data Lake Management: Manage and organize data within Azure Data Lake Storage (ADLS Gen2), implementing Delta Lake best practices for ACID transactions and time travel.
- Quality & Governance: Implement automated testing, data quality checks, and monitoring to ensure data integrity and availability.
- CI/CD & DevOps: Manage code versioning and deployment pipelines using Azure DevOps, Git, and CI/CD methodologies.
- Mentorship: Guide junior engineers, conduct code reviews, and establish best practices for coding standards and documentation.
Required Qualifications
- Experience: 6 to 9 years of proven experience in Data Engineering and ETL development.
- Core Technologies:
- Python: Advanced proficiency in Python coding and scripting.
- Databricks: Strong hands-on experience with Azure Databricks, including cluster management, job scheduling, and performance tuning.
- Spark: Deep understanding of Apache Spark architecture and PySpark.
- ADF: Extensive experience creating pipelines, linked services, and datasets in Azure Data Factory.
- Cloud Storage: Proficiency with Azure Data Lake Storage (ADLS Gen2) and Blob Storage.
- Database Skills: Strong SQL skills for querying and analyzing data in data warehouses (e.g., Azure Synapse Analytics, Snowflake, or SQL Server).
- Problem Solving: Ability to troubleshoot complex data issues and optimize slow-running queries or jobs.
Preferred Skills (Surrounding Tools)
- Experience with Unity Catalog for data governance.
- Knowledge of Airflow or other orchestration tools.
- Familiarity with containerization (Docker/Kubernetes).
- Experience with NoSQL databases (Cosmos DB, MongoDB).
- Understanding of Event Hubs or Kafka for real-time data streaming.
Soft Skills
- Strong communication skills to articulate technical concepts to non-technical stakeholders.
- Agile mindset with experience working in Scrum/Kanban teams.
- Proactive approach to learning new technologies and tools.