Job Summary
We are seeking a highly skilled Data Engineer with expertise in leveraging Data Lake architecture and the Azure cloud platform to develop, deploy, and optimize data-driven solutions. You will play a pivotal role in transforming raw data into actionable insights, supporting strategic decision-making across the organization.
Key Responsibilities
- Develop and optimize scalable data pipelines using Python and PySpark
- Build and orchestrate data workflows with Azure Data Factory
- Design and implement solutions using Azure Databricks and Synapse Analytics
- Manage and maintain data storage solutions in Azure Data Lake Storage Gen2, leveraging cost-efficient architecture
- Implement and manage CI/CD pipelines using Azure Logic Apps, with integration of tools such as Azure DevOps, GitHub Actions, or Jenkins
- Model and maintain Medallion Architecture across Bronze, Silver, and gold layers based on business requirements
- Collaborate with data scientists, analysts, and business stakeholders to ensure reliable data availability
- Monitor and optimize performance and cost efficiency of data solutions across the ecosystem
Required Skills & Experience
- Strong proficiency in Python and PySpark for data manipulation and transformation
- Hands-on experience with Azure Data Factory, Azure Databricks, and Synapse Analytics
- Familiarity with Azure Logic Apps for CI/CD pipeline management
- Knowledge of CI/CD tools such as:
- Azure DevOps Pipelines
- GitHub Actions
- Jenkins
- Expertise in managing Azure Data Lake Storage Gen2 environments with emphasis on security and cost optimization
- Deep understanding of Medallion Architecture principles:
- Bronze Layer: Raw data ingestion
- Silver Layer: Cleansed, enriched, and business-ready data
- Gold Layer: Aggregated and analytics-ready data models
- Strong problem-solving and communication skills
- Experience on Cost optimization