Data Engineer Sr (Databricks)

ntt data north america

Bengaluru, India

8-10 Years

Save

Posted 6 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Roles & Responsibilities

Develop Databricks notebooks, jobs, and workflows to replicate and enhance DB2/Guidewire-based pipelines and transformations.
Implement Delta Lake tables and patterns (bronze/silver/gold, ACID, time travel, schema evolution) for migrated data.
Integrate Databricks with AWS/S3 or Azure ADLS, ADF/Synapse, Key Vault, and Snowflake as required.
Optimize Databricks clusters, jobs, and queries for performance and cost.
Implement incremental loads, CDC patterns, and batch schedules for large datasets.
Collaborate with Snowflake and dbt teams to ensure consistent data models and data contracts.
Participate in data validation and reconciliation between DB2 400 / Guiderwire and Databricks outputs.
Follow coding standards, version control, and CI/CD practices using Git/Azure DevOps.
Provide defect fixes and support during SIT/UAT and post go-live stabilization.

Experience :

Legacy Demystification & Ingestion (DB2/400 & Guidewire)

8+ years of experience in databricks with understanding complex legacy data models and getting that data into the cloud:

Extracting DB2/AS400: Experience with Change Data Capture (CDC) or scheduled batch extractions from DB2 into cloud storage. Involves working through JDBC connections, mapping table dependencies, and re-platforming legacy SQL to distributed computing standards.
Handling Guidewire Data: Integrating with Guidewire Cloud Data Access (CDA) or InsuranceSuite to replicate complex P&C (Property & Casualty) insurance schemas. Senior engineers parse these highly normalized operational databases and transform them into analytical-friendly schemas in the cloud.
Architecture & Pipeline Development

The core of the experience involves transitioning these legacy, row-based stores into a scalable Medallion Architecture (Bronze, Silver, Gold layers):

Delta Lake Optimization: Using Databricks and Apache Spark to build ETL/ELT data pipelines with ACID transactions. Senior engineers handle schema evolution, upserts, and slowly changing dimensions (SCD Type 2).
Business Logic Refactoring: Translating rigid legacy procedural code (e.g., RPG/COBOL background logic, stored procedures) into scalable distributed patterns (PySpark, Spark SQL, and Scala).
Data Governance & Observability

A senior engineer is expected to govern vast amounts of incoming and generated data across the enterprise:

Unity Catalog: Implementing strict data governance, lineage tracing, and table-level security.
Data Quality: Automating data validation frameworks to ensure a seamless transition from legacy to modern systems without data loss or corruption.
Integration with the Databricks Platform Ecosystem

Moving beyond basic storage to utilizing the full power of the Databricks Data Intelligence Platform:

Serverless Compute: Managing Databricks serverless resources, ensuring optimal cluster sizing, and reducing compute costs.
Streaming and Batch Workflows: Building event-driven pipelines using features like Databricks Auto Loader to ingest flat files and streaming records directly into Delta tables.