Position: Data Quality Engineer (8 to 12 Years)
Location: Bangalore
Notice Period: Immediate
Job Description Data Quality Engineer (812 Years) | Location - Bangalore
Role Summary
We are seeking a hands-on Data Quality Engineer to design, implement, and operate automated data quality controls across an AWS + Databricks Lakehouse platform. This role ensures trusted data from ingestion (streaming/batch) through curated (Gold) layers by implementing quality rules, validation frameworks, monitoring, and remediation workflows aligned to business and governance standards.
Key Responsibilities
Data Quality Strategy & Rule Implementation
- Define and implement data quality dimensions (accuracy, completeness, timeliness, consistency, uniqueness, validity) across Bronze/Silver/Gold datasets.
- Partner with business/data owners to translate requirements into DQ rules, thresholds, and acceptance criteria.
- Maintain a DQ rule repository and ensure versioning, traceability, and approvals.
Automation & Frameworks
- Build and operationalize automated data checks using:
- Databricks (PySpark, Spark SQL) and/or AWS Glue (PySpark) jobs
- DQ frameworks such as Great Expectations, Deequ, or custom rule engines
- Embed quality gates into pipelines (pre/post checks, quarantine patterns, fail-fast vs warn policies).
- Create reusable DQ components (rule templates, test suites, profiling modules).
Monitoring, Alerting & Incident Management
- Set up DQ monitoring dashboards (Databricks dashboards / CloudWatch / third-party observability).
- Configure alerting for threshold breaches and anomalies (schema drift, outliers, volume spikes, null surges).
- Perform root cause analysis and lead remediation with pipeline owners and source system teams.
Metadata, Lineage & Governance Alignment
- Contribute DQ metadata into the data catalog/governance tool (e.g., Atlan) including rule coverage, scorecards, and dataset certification.
- Support lineage-driven quality impact analysis and audit readiness.
Quality Reporting & Scorecards
- Publish and maintain data quality scorecards by domain/dataset (daily/weekly) with trends and SLA adherence.
- Track issue backlog, triage priority, and resolution SLAs.
Must-Have Skills & Experience
- 812 years in Data Engineering / Data Quality Engineering, with strong focus on DQ automation.
- Strong hands-on with Databricks (PySpark, Spark SQL, Delta/Iceberg concepts).
- Strong on AWS data ecosystem: S3, Glue (Catalog/Jobs), IAM, CloudWatch, KMS.
- Experience validating Iceberg/Parquet datasets and managing schema evolution/schema drift.
- Proficiency in Python, SQL, and writing testable data transformation logic.
- Experience in batch and/or streaming validation patterns (micro-batch/near real-time).
- Exposure to CI/CD for data pipelines (Azure DevOps/GitHub/Jenkins) and automated test execution.
- Solid understanding of data modeling and canonical layer patterns.
Nice-to-Have
- Experience with Unity Catalog, Databricks Workflows, Lakehouse Monitoring, DBSQL.
- Experience with Power BI semantic models and ensuring quality for reporting datasets.
- Familiarity with Data Observability tools (Monte Carlo/Databand/Bigeye etc.) or building internal equivalents.
- Knowledge of MDM, data governance, and policy-driven controls.
- Domain experience: Manufacturing/Supply Chain/ERP/ServiceNow/Workday type sources.
Qualifications
- Bachelor's/Master's in Computer Science, Engineering, or equivalent.
- Strong communication skills to drive alignment with data owners and stakeholders.