
Search by job, company or skills
Role Summary
We are seeking an experiencedSenior AzureDatabricksEngineerto lead the design, development, and optimization of scalable and faulttolerant data solutions using pipelines and notebooks, implementing strong data quality and governance solutions in Lakehouse architecture. You will work across engineering, analytics, and business teams to build robust, highperformant data workflows and ensure bestinclass delivery on Azure andDatabricksplatforms. Candidates must have handson experience with traditional Data Warehouses (on any RDBMS such as SQL Server, Oracle, PostgreSQL, etc.), Business Analytics/BI, and endtoend BI & Analytics project delivery from requirements through production.
Key Responsibilities
Platform & Infrastructure
- OverseeDatabricksplatform configuration, resource management, workspace structuring, and cluster optimization.
- Monitor and troubleshoot performance issues across clusters, jobs, notebooks, and pipelines.
- Implement governance, security, compliance, and data access control using RoleBased Access Control (RBAC) and Unity Catalog.
- Establish and enforce data governance controls (lineage, cataloging, ownership) and cost management guardrails across workspaces and clusters.
-DatabricksArchitecture & Workspaces: Design multiworkspace (DEV/TEST/PROD) patterns, metastore strategy in Unity Catalog, environment isolation, VNet injection/private link, and secure data access to ADLS Gen2.
- Compute Strategy: Define and govern use of AllPurpose Clusters vs Job Clusters, SQL Warehouses (Pro/Serverless), and Photon acceleration; set Cluster Policies (node families, autoscaling, idle timeouts, spot/preemptible usage, limits).
Pipeline Development & Architecture
- Design and implement endtoend data pipelines using PySpark, SQL, and Delta Lake within a medallion architecture using Data Factory andDatabricks.
- Build realtime and batch DLT pipelines usingDatabricks Delta Live Tables with a focus on reliability and scalability.
- Optimize Lakehouse architecture for performance, costefficiency, and data integrity.
- Automate data ingestion, transformation, and validation, including support for streaming (Autoloader) and scheduled workflows.
- Perform data transformations, cleansing, and validations using data quality rules for consistent and accurate data sets.
- Manage and monitor job orchestration, ensuring efficient pipeline runs and reliability.
- Design and maintain traditional Data Warehouse layers (staging, EDW, data marts) on RDBMS platforms and integrate them with the Lakehouse.
- Define and implement dimensional models (star/snowflake), semantic layers, and conformed dimensions to serve BI/Analytics.
- Performance Engineering (Delta/Spark): Apply file sizing best practices, OPTIMIZE/ZORDER, data skipping, AQE, broadcast/hash/sortmerge joins, shuffle tuning, caching, checkpointing for streaming, and VACUUM/retention management.
- SQL Warehouses Optimization: Rightsize warehouse tiers, govern concurrency, result caching, and photonenabled execution for BI queries.
CI/CD & DevOps
- Design and maintain CI/CD pipelines forDatabricksartifacts (notebooks, jobs, libraries) using tools such as Azure DevOps, GitHub Actions, Terraform, or Jenkins.
- Support trunkbased development, deployment workflows, and infrastructureascode practices.
- Manage version control and automated testing using Git and related DevOps practices.
- Implement automated data quality testing (e.g., expectations/validations) and deployment gates across environments.
- Cost Governance & Observability: Implement cluster/warehouse tagging, budgets/alerts, usage dashboards, job run telemetry, cost attribution by product/domain, and periodic rightsizing reviews.
Collaboration & Delivery
- Collaborate with product owners, business stakeholders, and data teams to gather requirements and translate them into technical solutions.
- Drive the adoption of best practices in coding, versioning, testing, deployment, monitoring, and security.
- Provide thought leadership on best practices in Data Engineering, Architecture, and Cloud Computing.
- Lead endtoend BI & Analytics project delivery, including requirements, solution design, backlog planning, UAT, documentation, and production operationalization.
- Partner with BI teams to enable Business Analytics (dashboards, KPIs, selfservice models) and ensure semantic consistency across domains.
- Stakeholder Enablement: Define cluster policies, workspace standards, and usage guidelines; conduct training for engineering/analytics users to drive performance and cost discipline.
Performance Optimization
- Deliver optimized Spark jobs and SQL queries for largescale data processing.
- Implement partitioning, caching, and indexing strategies to improve performance and scalability of big data workloads.
- Conduct POCs for capacity planning and recommend appropriate infrastructure optimizations for costeffectiveness.
- Optimize RDBMSbased Data Warehouses (query tuning, indexing, partitioning) and BI semantic layers for performance.
- Apply photon acceleration, Delta constraints, schema evolution controls, and data compaction strategies to balance performance and cost.
Documentation & Knowledge Sharing
- Create detailed documentation and review it for data workflows, SOPs, architectural reviews, and operational runbooks.
- Mentor junior team members and promote a culture of learning and innovation.
- Promote a culture of optimization and cost saving and enable researchdriven development.
Required Qualifications
Technical Expertise
- 7+ years in data engineering, with 3 + years ofDatabricksand Azure experience.
- Handson experience with Data Factory,DatabricksLakehouse Architecture, Delta Lake, PySpark, and Spark job optimization.
- Proficiency in Python, SQL, and optionally Scala for building scalable ETL/ELT pipelines.
- Strong SQL skills are essential, with handson experience in SQL Server, Oracle, PostgreSQL, or other traditional RDBMS platforms.
- Strong experience in designing and optimizing DLT pipelines, managing assets like notebooks and libraries, and configuringDatabricksWorkspaces.
- Strong foundation in Data Warehousing principles (Kimball/Inmon) and experience designing dimensional data models optimized for reporting.
- CoreDatabricksSkills:
- Architecture: Unity Catalog metastore design, workspace segmentation, secure networking (VNet injection, private link), catalog/schema governance, and data access patterns to ADLS/Fabric.
- Compute: Selection and governance of AllPurpose vs Job Clusters, SQL Warehouses (Classic/Pro/Serverless), Photon, autoscaling, node families (memory/compute optimized), and spot/preemptible usage strategies.
- Performance: ZORDER/OPTIMIZE, file size/compaction, partitioning/data skipping, AQE, broadcast join strategies, shuffle tuning, caching, streaming checkpoints, CDC/merge optimization.
- Cost Optimization: Cluster policies, idle timeouts, autotermination, tags/budgets/alerts, job clusters over interactive for production, rightsizing warehouses, and periodic cost reviews.
- Demonstrated experience delivering fullcycle BI & Analytics projects (requirements, modeling, ETL/ELT, semantic layer, testing, deployment, and support).
- Experience integrating data from ERP systems (e.g., SAP, Oracle) and operational sources into EDW/Lakehouse.
- Familiarity with data governance, cataloging, lineage, and rolebased data access.
Preferred Qualifications
-Databricksor cloud certifications (e.g.,DatabricksCertified Data Engineer Associate/Professional, Azure Data Engineer Associate).
- Prior Data Warehouse experience and experience migrating onpremise Data Warehouse to AzureDatabricks.
- Experience with Business Analytics, Business Intelligence, or providing BI/analytical solutions for any ERP system.
- Advanced expertise in PySpark and Spark DAG orchestration and optimization techniques.
- Automation experience with CI/CD pipelines using Azure DevOps, Jenkins, or Octopus.
- Familiarity with data mesh principles, data governance, and distributed architecture patterns.
- Knowledge of observability tools, Airflow, DBT, Snowflake, Fabric, Fivetran is a plus.
- Experience with BI tooling and semantic/MDX/Tabular models (e.g., Power BI, Tableau, Looker; Azure Analysis Services/Fabric semantic models).
- Experience defining KPIs, metrics catalogs, and data contracts with business stakeholders.
- Exposure to security and compliance requirements (PII handling, encryption at rest/in transit, auditability).
Soft Skills
- Strong communication and stakeholder management skills; ability to translate business needs into scalable technical designs.
- Proven ability to lead crossfunctional delivery, prioritize effectively, and operate within agile frameworks.
Job ID: 142898681