Senior Data Engineer

Xebia

Jaipur, India

Fresher

Save

Posted an hour ago
Be among the first 10 applicants

Early Applicant

Job Description

Mandatory Skill:

Azure Databrick Structured Streamin Delta Lake Delta Live Tables (DLTSpark Declarative Pipelines (SDP DatabricksAsset Bundle Unity Catalo Auto Loade Cloud File Apache SparK

Preferred Experience eMigration of traditional ETL pipelines to Spark Declarative Pipeline

Enterprise-scale Lakehouse implementation Data Quality frameworks and governance solution

Large-scale streaming architectures processing millions of events per da

Experience implementing Medallion Architecture

Streaming & Real-Time Data Processing

Design and develop real-time streaming pipelines using Databricks Structured Streaming.

Build and maintain Kafka-based ingestion frameworks.

Handle late-arriving events using watermarks, event-time processing, and stateful streaming concepts .Implement exactly-once processing and checkpointing mechanisms.

Monitor and optimize streaming workloads for performance and reliability.

Spark Declarative Pipelines (SDP) & Delta Live Tables (DLT)

Design and implement Spark Declarative Pipelines using Databrick
Develop Delta Live Table (DLT) pipelines for scalable data transformation
Implement data quality expectations validations, retries, and failure handling within DLT pipeline
Manage pipeline dependencies and orchestration using declarative approached.
Understand advantages of SDP over traditional ETL pipelines, including Simplified pipeline development
Reduced operational overhead

Automated lineage tracking

Improved maintainability

Enhanced observability

Databricks Asset Bundle

Develop and deploy Databricks Asset Bundles (DAB) forCI/CD.
Configure mandatory bundle components including: databricks.yml Resource definitionsJob definitions Environment configurationions Source code artifacts Manage deployment across development, testing, and production environments.
Data Ingestion & Auto Loader
Build scalable ingestion frameworks using Databricks Auto Loader (CloudFiles).
Configure schema inference and schema evolution strategies.
Handle duplicate records and implement deduplication mechanisms.Design robust ingestion pipelines from ADLS Gen2 and other cloud storage systems.Work with CloudFiles architecture and incremental processing patterns.Lakehouse & Medallion ArchitectureDesign and implement Bronze, Silver, and Gold layer architecturesBuild enterprise-grade data products using Lakehouse principles.Implement Delta Lake optimization techniques.Ensure data quality, governance, and lineage across layers.

Azure Data Engineering

Work extensively with:Azure Data Lake Storage Gen2 (ADLS Gen2)Azure Service Principals

Azure Key Vault Azure Data Factoy (ADF)Azure Databricks

Implement secure authentication and authorization mechanisms.

Troubleshoot and debug ADF pipeline failures.

Spark Optimization & Performance Tuning

Optimize Spark jobs using:

Partitioning strategies

Adaptive Query Execution (AQE)Broadcast joins

Caching and persistence

ZOrderingFile compaction techniques Analyze Spark UI for job failures and performance bottlenecks.

Troubleshoot executor failures, stage failures, skew issues, memory problems, and shuffle bottlenecks.