Search by job, company or skills

ntt data north america

Data Engineer Sr (Databricks)

Save
  • Posted 6 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Roles & Responsibilities

  • Develop Databricks notebooks, jobs, and workflows to replicate and enhance DB2/Guidewire-based pipelines and transformations.
  • Implement Delta Lake tables and patterns (bronze/silver/gold, ACID, time travel, schema evolution) for migrated data.
  • Integrate Databricks with AWS/S3 or Azure ADLS, ADF/Synapse, Key Vault, and Snowflake as required.
  • Optimize Databricks clusters, jobs, and queries for performance and cost.
  • Implement incremental loads, CDC patterns, and batch schedules for large datasets.
  • Collaborate with Snowflake and dbt teams to ensure consistent data models and data contracts.
  • Participate in data validation and reconciliation between DB2 400 / Guiderwire and Databricks outputs.
  • Follow coding standards, version control, and CI/CD practices using Git/Azure DevOps.
  • Provide defect fixes and support during SIT/UAT and post go-live stabilization.

Experience :

Legacy Demystification & Ingestion (DB2/400 & Guidewire)

8+ years of experience in databricks with understanding complex legacy data models and getting that data into the cloud:

  • Extracting DB2/AS400: Experience with Change Data Capture (CDC) or scheduled batch extractions from DB2 into cloud storage. Involves working through JDBC connections, mapping table dependencies, and re-platforming legacy SQL to distributed computing standards.
  • Handling Guidewire Data: Integrating with Guidewire Cloud Data Access (CDA) or InsuranceSuite to replicate complex P&C (Property & Casualty) insurance schemas. Senior engineers parse these highly normalized operational databases and transform them into analytical-friendly schemas in the cloud.
  • Architecture & Pipeline Development

The core of the experience involves transitioning these legacy, row-based stores into a scalable Medallion Architecture (Bronze, Silver, Gold layers):

  • Delta Lake Optimization: Using Databricks and Apache Spark to build ETL/ELT data pipelines with ACID transactions. Senior engineers handle schema evolution, upserts, and slowly changing dimensions (SCD Type 2).
  • Business Logic Refactoring: Translating rigid legacy procedural code (e.g., RPG/COBOL background logic, stored procedures) into scalable distributed patterns (PySpark, Spark SQL, and Scala).
  • Data Governance & Observability

A senior engineer is expected to govern vast amounts of incoming and generated data across the enterprise:

  • Unity Catalog: Implementing strict data governance, lineage tracing, and table-level security.
  • Data Quality: Automating data validation frameworks to ensure a seamless transition from legacy to modern systems without data loss or corruption.
  • Integration with the Databricks Platform Ecosystem

Moving beyond basic storage to utilizing the full power of the Databricks Data Intelligence Platform:

  • Serverless Compute: Managing Databricks serverless resources, ensuring optimal cluster sizing, and reducing compute costs.
  • Streaming and Batch Workflows: Building event-driven pipelines using features like Databricks Auto Loader to ingest flat files and streaming records directly into Delta tables.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 149091441

Similar Jobs

Bengaluru, India

Skills:

snowflake DatabricksAws S3Apache SparkAdfPysparkSpark SQLGitScalaAzure DevOpsSynapseDelta LakeKey VaultAzure ADLS