Roles & Responsibilities
- Develop Databricks notebooks, jobs, and workflows to replicate and enhance DB2/Guidewire-based pipelines and transformations.
- Implement Delta Lake tables and patterns (bronze/silver/gold, ACID, time travel, schema evolution) for migrated data.
- Integrate Databricks with AWS/S3 or Azure ADLS, ADF/Synapse, Key Vault, and Snowflake as required.
- Optimize Databricks clusters, jobs, and queries for performance and cost.
- Implement incremental loads, CDC patterns, and batch schedules for large datasets.
- Collaborate with Snowflake and dbt teams to ensure consistent data models and data contracts.
- Participate in data validation and reconciliation between DB2 400 / Guiderwire and Databricks outputs.
- Follow coding standards, version control, and CI/CD practices using Git/Azure DevOps.
- Provide defect fixes and support during SIT/UAT and post go-live stabilization.
Experience :
Legacy Demystification & Ingestion (DB2/400 & Guidewire)
8+ years of experience in databricks with understanding complex legacy data models and getting that data into the cloud:
- Extracting DB2/AS400: Experience with Change Data Capture (CDC) or scheduled batch extractions from DB2 into cloud storage. Involves working through JDBC connections, mapping table dependencies, and re-platforming legacy SQL to distributed computing standards.
- Handling Guidewire Data: Integrating with Guidewire Cloud Data Access (CDA) or InsuranceSuite to replicate complex P&C (Property & Casualty) insurance schemas. Senior engineers parse these highly normalized operational databases and transform them into analytical-friendly schemas in the cloud.
- Architecture & Pipeline Development
The core of the experience involves transitioning these legacy, row-based stores into a scalable Medallion Architecture (Bronze, Silver, Gold layers):
- Delta Lake Optimization: Using Databricks and Apache Spark to build ETL/ELT data pipelines with ACID transactions. Senior engineers handle schema evolution, upserts, and slowly changing dimensions (SCD Type 2).
- Business Logic Refactoring: Translating rigid legacy procedural code (e.g., RPG/COBOL background logic, stored procedures) into scalable distributed patterns (PySpark, Spark SQL, and Scala).
- Data Governance & Observability
A senior engineer is expected to govern vast amounts of incoming and generated data across the enterprise:
- Unity Catalog: Implementing strict data governance, lineage tracing, and table-level security.
- Data Quality: Automating data validation frameworks to ensure a seamless transition from legacy to modern systems without data loss or corruption.
- Integration with the Databricks Platform Ecosystem
Moving beyond basic storage to utilizing the full power of the Databricks Data Intelligence Platform:
- Serverless Compute: Managing Databricks serverless resources, ensuring optimal cluster sizing, and reducing compute costs.
- Streaming and Batch Workflows: Building event-driven pipelines using features like Databricks Auto Loader to ingest flat files and streaming records directly into Delta tables.