Search by job, company or skills

Xebia

Senior Data Engineer

Save
  • Posted an hour ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Mandatory Skill:

Azure Databrick Structured Streamin Delta Lake Delta Live Tables (DLTSpark Declarative Pipelines (SDP DatabricksAsset Bundle Unity Catalo Auto Loade Cloud File Apache SparK

Preferred Experience eMigration of traditional ETL pipelines to Spark Declarative Pipeline

Enterprise-scale Lakehouse implementation Data Quality frameworks and governance solution

Large-scale streaming architectures processing millions of events per da

Experience implementing Medallion Architecture

Streaming & Real-Time Data Processing

Design and develop real-time streaming pipelines using Databricks Structured Streaming.

Build and maintain Kafka-based ingestion frameworks.

Handle late-arriving events using watermarks, event-time processing, and stateful streaming concepts .Implement exactly-once processing and checkpointing mechanisms.

Monitor and optimize streaming workloads for performance and reliability.

Spark Declarative Pipelines (SDP) & Delta Live Tables (DLT)

  • Design and implement Spark Declarative Pipelines using Databrick
  • Develop Delta Live Table (DLT) pipelines for scalable data transformation
  • Implement data quality expectations validations, retries, and failure handling within DLT pipeline
  • Manage pipeline dependencies and orchestration using declarative approached.
  • Understand advantages of SDP over traditional ETL pipelines, including Simplified pipeline development
  • Reduced operational overhead

Automated lineage tracking

Improved maintainability

Enhanced observability

Databricks Asset Bundle

  • Develop and deploy Databricks Asset Bundles (DAB) forCI/CD.
  • Configure mandatory bundle components including: databricks.yml Resource definitionsJob definitions Environment configurationions Source code artifacts Manage deployment across development, testing, and production environments.
  • Data Ingestion & Auto Loader
  • Build scalable ingestion frameworks using Databricks Auto Loader (CloudFiles).
  • Configure schema inference and schema evolution strategies.
  • Handle duplicate records and implement deduplication mechanisms.Design robust ingestion pipelines from ADLS Gen2 and other cloud storage systems.Work with CloudFiles architecture and incremental processing patterns.Lakehouse & Medallion ArchitectureDesign and implement Bronze, Silver, and Gold layer architecturesBuild enterprise-grade data products using Lakehouse principles.Implement Delta Lake optimization techniques.Ensure data quality, governance, and lineage across layers.

Azure Data Engineering

Work extensively with:Azure Data Lake Storage Gen2 (ADLS Gen2)Azure Service Principals

Azure Key Vault Azure Data Factoy (ADF)Azure Databricks

Implement secure authentication and authorization mechanisms.

Troubleshoot and debug ADF pipeline failures.

Spark Optimization & Performance Tuning

Optimize Spark jobs using:

Partitioning strategies

Adaptive Query Execution (AQE)Broadcast joins

Caching and persistence

ZOrderingFile compaction techniques Analyze Spark UI for job failures and performance bottlenecks.

Troubleshoot executor failures, stage failures, skew issues, memory problems, and shuffle bottlenecks.

Unity Catalog & Governance Implement and manage Unity Catalog.

Configure data access controls and governance policies.

Establish lineage tracking and data security standards.

Manage catalog, schema, and table-level permissions.CI/CD & DevOps

Implement CI/CD pipelines for Databricks projects.

Work with Git branching strategies:

Feature Branches Pull Requests Code Reviews

Merge Processes Integrate VS Code with Databricks.

Automate deployments using Databricks Asset Bundles and DevOps pipelines.

Experience with Azure DevOps, GitHub Actions, Jenkins, or equivalent CI/CD platforms.

Data Modeling & SQL Develop complex SQL transformations.

Build optimized analytical data models.

Write performant SQL queries for large-scal datasets.

Design dimensional and Lakehouse data

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 150035175