Search by job, company or skills

equifax india

Lead - Data Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Synopsis of the role

As a Lead Data Engineer, you will serve as the technical backbone of our data ecosystem, spearheading the design and implementation of high-performance data architectures using Azure Databricks and PySpark. You will be responsible for orchestrating complex, scalable ETL/ELT pipelines within Azure Data Factory, ensuring seamless data integration. By leveraging your mastery of SQL and distributed computing, you will optimize large-scale datasets to drive advanced analytics and business intelligence initiatives.

What you'll do

  • As a Lead Data Engineer, you are expected to drive the technical roadmap and execution of our data strategy. Your role will encompass the following core responsibilities:
  • Medallion Architecture Implementation: Design and maintain a multi-layered data lakehouse (Bronze, Silver, Gold) to ensure data quality, lineage, and structural refinement from raw ingestion to business-ready assets.
  • Delta Lake Development: Build and optimize high-performance tables using Delta Lake, leveraging features like ACID transactions, schema enforcement, and time travel to ensure data reliability.
  • Star Schema Data Modeling: Architect robust dimensional models and Star Schemas in the Gold layer to simplify data access and optimize query performance for downstream BI tools.
  • Data Governance with Unity Catalog: Implement and manage centralized Unity Catalog configurations to enforce fine-grained access control, data discovery, and comprehensive lineage across the Azure workspace.
  • Scalable PySpark Engineering: Develop, test, and deploy complex data transformation logic using PySpark, ensuring efficient distributed processing and resource utilization within Databricks clusters.
  • End-to-End Pipeline Orchestration: Create and monitor sophisticated ETL/ELT workflows using Azure Data Factory (ADF), integrating diverse data sources into a unified cloud ecosystem.
  • Data Mart Development: Build specialized, high-performance Data Marts tailored to specific business domains, enabling self-service analytics and rapid decision-making for stakeholders.
  • Advanced SQL Optimization: Write and tune complex SQL queries for data analysis and validation, ensuring that data processing logic is both performant and cost-effective.
  • Performance Tuning & CI/CD: Lead efforts in cluster configuration, partition tuning, and the automation of deployment pipelines using Azure DevOps to ensure high availability and continuous delivery.

What Experience You Need

  • Total Data Engineering Experience: Minimum of 8+ years in data engineering, data warehousing, or Database development roles.
  • Azure Ecosystem Expertise: At least 4+ years of hands-on experience specifically building and deploying production-grade solutions on Azure.
  • Databricks & PySpark Mastery: A minimum of 3+ years leading projects that utilize Azure Databricks and PySpark for large-scale distributed data processing.
  • Lead/Architectural Experience: At least 2+ years in a Lead or Senior capacity, with documented experience in designing end-to-end data architectures (e.g., transitioning a legacy system to a Medallion architecture).
  • SQL Proficiency: 6+ years of advanced SQL development, including performance tuning, complex window functions, and stored procedure optimization.
  • Production Pipeline Delivery: Proven track record of deploying at least 3-5 enterprise-scale data pipelines using Azure Data Factory (ADF) from inception to production.
  • Education/Certifications: Bachelor's / Master's degree in CS or related field, Possession of at least one relevant professional certification, such as Microsoft Certified: Azure Data Engineer Associate (DP-203) or Databricks Certified Data Engineer Professional.

What could set you apart

  • Advanced Databricks Optimization & Lakehouse Features
  • Databricks SQL & Serverless: Experience migrating traditional SQL workloads to Databricks SQL Warehouses to reduce latency and overhead.
  • Delta Live Tables (DLT): Proven ability to implement declarative data pipelines that handle task orchestration, monitoring, and quality constraints automatically.
  • Liquid Clustering: Mastery of the latest replacement for Z-Ordering to optimize data layout and query performance without manual partition management.
  • DevOps & Infrastructure as Code (IaC)
  • Terraform/Bicep: Experience deploying entire Azure environments (Resource Groups, Storage Accounts, Databricks Workspaces) using IaC to ensure environment parity across Dev, QA, and Production.
  • Unit Testing for Spark: Experience using frameworks like pytest or chispa to validate PySpark logic, ensuring a robust CI/CD cycle rather than testing in production.
  • Comprehensive Data Governance & Security
  • Unity Catalog Migration: Experience leading a migration from legacy system to Unity Catalog, including managing Identity Federation and Cross-Workspace sharing.
  • Fine-Grained Security: Implementation of Row-Level Security (RLS) and Column-Level Masking directly within Databricks to comply with strict privacy regulations (GDPR/CCPA).
  • Real-Time & Hybrid Processing
  • Structured Streaming: Building production-grade, low-latency pipelines that process data in real-time or near-real-time from Azure Event Hubs or IoT Hubs.
  • Change Data Capture (CDC): Implementing efficient CDC patterns (using tools like Debezium or ADF's built-in CDC) to sync on-premise relational databases with the Delta Lake in near-real-time.
  • Cost Governance & FinOps
  • DBU & Cost Management: A track record of implementing Databricks Cluster Policies and tagging strategies to monitor and reduce DBU (Databricks Unit) consumption.
  • Optimization of ADF Triggers: Knowledge of when to use Tumbling Window vs. Schedule triggers and optimizing Integration Runtimes to minimize execution costs.

#India

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 145041071

Similar Jobs