Job description:
Job Title: Azure Data Engineer – Azure Data Factory, Azure Data Lake, Azure Databricks
Experience: 3+ years
Location: Pune, India
(Hybrid/Remote as per project need)
Shifts: 6:30 AM to 3:30 PM IST
(Client shift may apply)
Role Summary
You will build and support Azure-based data platforms.
You will create pipelines for ingestion, transformation, and analytics.
You will manage data lake and warehouse layers with strong data modeling.
You will enable AI/ML workloads by preparing quality datasets and supporting Azure ML.
Primary Skills (Must Have)
- Azure Data Factory (ADF) – pipeline design, triggers, monitoring, error handling
- Azure Databricks (Spark / PySpark) – transformations, performance tuning, Delta (if used)
- Azure Data Lake Storage (ADLS Gen2) – lake design, folder structure, partitioning
- Azure Synapse Analytics – analytics/warehouse concepts and data serving
- SQL (Advanced) – complex queries, validation, tuning
- Python – data processing + scripting (ML exposure is a plus)
- Data Modeling & ETL – strong warehouse and dimensional modeling understanding
- Integration of multiple Azure services end-to-end
Key Responsibilities
1) Data Ingestion & Orchestration (Azure Data Factory)
- Design and build scalable ADF pipelines for batch and incremental loads.
- Configure linked services, datasets, triggers, and integration runtime.
- Implement retry logic, alerts, and failure handling.
- Maintain pipeline standards, parameters, and reusable templates.
- Monitor daily runs and fix failures with proper RCA.
2) Data Lake Design & Storage Management (ADLS + Azure SQL)
- Design data lake layers: raw, staged, curated, consumption.
- Ensure correct formats like Parquet/Delta/CSV based on need.
- Apply partitioning and naming standards for performance and clarity.
- Manage curated datasets in Azure SQL Database when required.
- Ensure data availability, retention, and lifecycle policies.
3) Data Transformation & Big Data Processing (Databricks)
- Develop transformations using PySpark / Spark SQL in Databricks.
- Implement data quality checks and reconciliation rules.
- Optimize cluster usage, caching, and job performance to reduce cost.
- Implement incremental processing and upsert patterns (MERGE) if needed.
- Schedule and run Databricks jobs through ADF or job workflows.
4) Data Warehousing & Analytics (Synapse)
- Build and support analytics solutions using Azure Synapse.
- Design warehouse objects and implement loading strategies.
- Support query tuning and performance improvement.
- Publish curated, trusted datasets for BI and downstream apps.
5) Data Modeling & ETL Design
- Create logical and physical data models for reporting and analytics.
- Apply star schema / dimensional modeling where needed.
- Maintain source-to-target mapping and transformation rules.
- Ensure data consistency across lake, warehouse, and BI layers.
6) AI/ML Enablement (Azure Machine Learning)
- Support ML pipelines through feature preparation and dataset readiness.
- Work with Data Scientists for training and deployment support.
- Build Python scripts for model experiments when required.
- Use libraries like Scikit-learn (preferred), and TensorFlow/PyTorch (good to have).
- Track model inputs, outputs, and repeatable pipeline execution.
7) SQL, Python & Engineering Practices
- Write optimized SQL for validation, reconciliation, and transformations.
- Write clean Python code for automation and data processing.
- Use Git with good branching and PR review practices.
- Support CI/CD practices for data pipelines if project has it.
8) Security, Compliance & Governance
- Follow best practices for secure data handling and access control.
- Work with RBAC, managed identity, and Key Vault where applicable.
- Ensure compliance with client policies and audit needs.
- Implement encryption, access boundaries, and safe data sharing.
9) Agile Delivery & Production Support
- Work in Agile/Scrum mode and deliver stories on time.
- Provide estimates and daily updates to stakeholders.
- Support production issues and perform RCA with prevention steps.
- Maintain runbooks and operational documents.
Secondary Skills (Good to Have)
- Power BI – dataset modeling, dashboards, refresh, performance basics
- Azure Functions / Logic Apps – automation and integration support
- Azure Cognitive Services – awareness for AI use cases (optional)
- Big data background: Hadoop basics, strong Spark understanding
- Monitoring tools: Log Analytics / Azure Monitor (as used in project)
- DevOps exposure: Azure DevOps pipelines for data workloads
Tools / Technologies (Typical)
- Azure: ADF, ADLS Gen2, Databricks, Synapse, Azure SQL, Azure ML
- Languages: Python, SQL, PySpark
- Dev Tools: Git, Azure DevOps / Jira (as applicable)
- Monitoring: ADF monitor, Databricks job runs, Azure Monitor (if enabled)
Qualification
- BE/BTech/BCA/MCA or equivalent practical experience
Soft Skills
- Clear communication and strong ownership.
- Good problem solving and troubleshooting mindset.
- Good documentation habit and disciplined delivery.
Works well with business, platform, and security teams