Search by job, company or skills

Happiest Minds Technologies

LEAD DATA ENGINEER - Databricks

Save
new job description bg glownew job description bg glow
  • Posted 11 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

  • C4 – Lead Data Engineer (8–10 years)
  • Role Overview
  • Engineering lead who owns design & delivery end-to-end; raises bar on performance, DataOps, and governance; mentors C3s.
  • Core Responsibilities
    • Design high-throughput batch & streaming (Structured Streaming/Autoloader) to Medallion targets.
    • Implement advanced Delta: MERGE/SCD2, schema evolution/enforcement, OPTIMIZE/Z-ORDER/VACUUM, compaction.
    • Engineer observability: metrics, alerts, SLIs/SLOs, error budgets; on-call runbooks.
    • Implement Unity Catalog patterns (RBAC/ABAC), row/column-level security; Key Vault integration; PII/PHI handling.
    • Own cost & performance: Photon, autoscaling, cluster policies/pools; job SLAs.
    • Lead CI/CD (Repos/CLI/ADO) with automated unit/data tests and promotion gates; design reviews and reusable frameworks.
  • Must-have Skills (Screening Focus)
    • Deep PySpark/Spark SQL/Delta; DLT + Workflows orchestration.
    • MERGE/SCD2; schema evolution/enforcement; time travel.
    • Performance/cost levers: OPTIMIZE/Z-ORDER, file sizing, Photon, autoscaling; cluster policies/pools.
    • Unity Catalog RBAC/ABAC; secrets; PII/PHI controls.
    • CI/CD with tests (unit/data expectations) and environment promotion; on-call readiness.
  • Nice-to-have (Prefer, not Mandatory)
    • Kafka/Event Hubs streaming patterns; Synapse integration.
    • Delta Sharing, Lakehouse Federation; dbt for transformations.
  • 90-day Outcomes (Examples)
    • Reduce a priority job's runtime ≥25% and cost/GB ≥20% within 90 days.
    • Stand up CI/CD + tests + SLOs for one DLT pipeline with alerting; publish a reusable SCD2/DQ module

More Info

Job Type:
Industry:
Employment Type:

Job ID: 147461367

Similar Jobs

Bengaluru, India

Skills:

PysparkJavaAws RedshiftAWS GlueData WarehousingData ArchitectureNumpyPandasAzure Synapse AnalyticsSparkDatabricksData ModellingPythonAirflowGoogle BigQueryData integration toolsdbtETL processesDaskDelta LakeData pipeline orchestration