Azure Data Engineer - Databricks

Fujitsu

Pune, India

3-5 Years

Save

Posted 5 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Job description:

Job Title: Azure Data Engineer – Azure Data Factory, Azure Data Lake, Azure Databricks

Experience: 3+ years

Location: Pune, India
(Hybrid/Remote as per project need)

Shifts: 6:30 AM to 3:30 PM IST

(Client shift may apply)

Role Summary

You will build and support Azure-based data platforms.
You will create pipelines for ingestion, transformation, and analytics.
You will manage data lake and warehouse layers with strong data modeling.
You will enable AI/ML workloads by preparing quality datasets and supporting Azure ML.

Primary Skills (Must Have)

Azure Data Factory (ADF) – pipeline design, triggers, monitoring, error handling
Azure Databricks (Spark / PySpark) – transformations, performance tuning, Delta (if used)
Azure Data Lake Storage (ADLS Gen2) – lake design, folder structure, partitioning
Azure Synapse Analytics – analytics/warehouse concepts and data serving
SQL (Advanced) – complex queries, validation, tuning
Python – data processing + scripting (ML exposure is a plus)
Data Modeling & ETL – strong warehouse and dimensional modeling understanding
Integration of multiple Azure services end-to-end

Key Responsibilities

1) Data Ingestion & Orchestration (Azure Data Factory)

Design and build scalable ADF pipelines for batch and incremental loads.
Configure linked services, datasets, triggers, and integration runtime.
Implement retry logic, alerts, and failure handling.
Maintain pipeline standards, parameters, and reusable templates.
Monitor daily runs and fix failures with proper RCA.

2) Data Lake Design & Storage Management (ADLS + Azure SQL)

Design data lake layers: raw, staged, curated, consumption.
Ensure correct formats like Parquet/Delta/CSV based on need.
Apply partitioning and naming standards for performance and clarity.
Manage curated datasets in Azure SQL Database when required.
Ensure data availability, retention, and lifecycle policies.

3) Data Transformation & Big Data Processing (Databricks)

Develop transformations using PySpark / Spark SQL in Databricks.
Implement data quality checks and reconciliation rules.
Optimize cluster usage, caching, and job performance to reduce cost.
Implement incremental processing and upsert patterns (MERGE) if needed.
Schedule and run Databricks jobs through ADF or job workflows.

4) Data Warehousing & Analytics (Synapse)

Build and support analytics solutions using Azure Synapse.
Design warehouse objects and implement loading strategies.
Support query tuning and performance improvement.
Publish curated, trusted datasets for BI and downstream apps.

5) Data Modeling & ETL Design

Create logical and physical data models for reporting and analytics.
Apply star schema / dimensional modeling where needed.
Maintain source-to-target mapping and transformation rules.
Ensure data consistency across lake, warehouse, and BI layers.

6) AI/ML Enablement (Azure Machine Learning)

Support ML pipelines through feature preparation and dataset readiness.
Work with Data Scientists for training and deployment support.
Build Python scripts for model experiments when required.
Use libraries like Scikit-learn (preferred), and TensorFlow/PyTorch (good to have).
Track model inputs, outputs, and repeatable pipeline execution.

7) SQL, Python & Engineering Practices