Senior Data Engineer

Majid Al Futtaim

Gurugram, Gurugram, India

6-8 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

Role overview

We're looking for a Data Engineer who thrives on building robust, real-time and batch data products on Microsoft Fabric. You'll design and operate ingestion from streaming sources (Event Hubs/Service Bus/Confluent Kafka), model curated Silver/Gold layers in Lakehouse, optimize KQL/Spark pipelines, and enable trustworthy, fast Power BI dashboards (including Direct Lake and semantic models).

What you'll do

Design and implement scalable data pipelines (batch + streaming) from diverse sources (REST, SFTP, RDBMS, Kafka/Event Hubs/Service Bus) into a lakehouse and OneLake.
Model and curate datasets using medallion architecture; build reusable frameworks for ingestion, schema evolution, and incremental processing.
Write efficient transformations in Spark (PySpark/SQL) and/or KQL; create materialized views, update policies, and optimization strategies for cost/perf.
Implement CDC, watermarking, late-arrivals handling, and idempotent writes for append/merge scenarios.
Enforce data quality, observability, and lineage (DQ rules, expectations, SLAs, alerts, metadata catalogs).
Apply security & governance best practices (PII hashing/tokenization, access controls, secrets management).
Productionize workloads with orchestration (Airflow/ADF/Azure Synapse/Step Functions/Glue), CI/CD, testing, and rollout strategies.
Partner with product/analytics teams to define SLAs, table contracts, and consumption patterns; create reliable semantic layers.
Troubleshoot performance, skew, and reliability issues; tune storage (Delta/Parquet/Iceberg) and compute configurations.

What you'll bring

6+ years of data engineering experience (title flexible: Data Engineer / Senior Data Engineer).
Strong SQL and one of Python/Scala. Deep familiarity with Spark (PySpark/SQL) and distributed data patterns.
Hands-on with one or more clouds (Azure/AWS/GCP) and a lakehouse stack (e.g., Databricks, Delta Lake, Fabric Lakehouse/Eventhouse, Synapse, BigQuery/Snowflake a plus).
Streaming experience: Kafka/Confluent, Azure Event Hubs, or Service Bus; schema registry, exactly-once/at-least-once semantics.
Solid understanding of medallion architecture, CDC, SCD, upserts/merge, partitioning, Z-ordering, compaction, and vacuum.
Orchestration & DevOps: Airflow/ADF/Glue/Step Functions; Git-based workflows, unit/integration tests, environments, and IaC (Terraform/ARM/CDK) preferred.
Data quality & governance: expectations/testing, lineage/metadata, RBAC/ABAC, PII protection (hashing/salting/tokenization).
Comfortable owning services in production: monitoring, alerting, SLIs/SLOs, on-call rotation.

What You've Done (Must-Haves)

5+ years in data engineering with cloud data platforms (Azure preferred).
Hands-on with Microsoft Fabric components: Eventhouse (KQL), Lakehouse (Delta on OneLake), Spark notebooks, Data Factory (Fabric pipelines), Power BI (including Direct Lake).
Solid SQL/KQL/PySpark; comfort with nested JSON, mv-expand, update policies, materialized views, partitioning.
Built production-grade streaming + batch pipelines; handled late/duplicate events, watermarking, and idempotency.
Strong grasp of data modeling, performance tuning, and data quality (unit tests, anomaly checks, SLAs).

Nice to Have

Confluent Kafka private networking patterns; CDC from operational stores.
Azure ecosystem: ADLS/OneLake, Key Vault, AAD, Purview, Event Hubs, Service Bus.
MLOps/feature store basics; Python packaging & testing (pytest).
Governance & compliance (GDPR/CCPA), PII handling, and secrets management.

Tech Stack You'll Touch

Microsoft Fabric: Eventhouse/KQL, Lakehouse/Delta, Spark notebooks, Data Factory, Power BI (Direct Lake)
Azure: Event Hubs, Service Bus, AAD, Key Vault, Purview
Langs/Tools: SQL, KQL, PySpark, Python, Git, CI/CD (ADO/GitHub)