Pipeline Development Design, build, and deploy robust ETL/ELT pipelines in Databricks (PySpark, SQL, Delta Lake) to ingest, transform, and curate governance and operational metadata from multiple sources landed in Databricks.
Granular Data Quality Capture Implement profiling logic to capture issue-level metadata (source table, column, timestamp, severity, rule type) to support drill-down from dashboards into specific records and enable targeted remediation.
Governance Metrics Automation Develop data pipelines to generate metrics for dashboards covering data quality, lineage, job monitoring, access & permissions, query cost, usage & consumption, retention & lifecycle, policy enforcement, sensitive data mapping, and governance KPIs.
Microsoft Purview Integration Automate asset onboarding, metadata enrichment, classification tagging, and lineage extraction for integration into governance reporting.
Data Retention & Policy Enforcement Implement logic for retention tracking and policy compliance monitoring (masking, RLS, exceptions).
Job & Query Monitoring Build pipelines to track job performance, SLA adherence, and query costs for cost and performance optimization.
Metadata Storage & Optimization Maintain curated Delta tables for governance metrics, structured for efficient dashboard consumption.