Data Quality Engineer

The Ksquare Group

Bengaluru, India

8-12 Years

Save

Posted 17 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Position: Data Quality Engineer (8 to 12 Years)

Location: Bangalore

Notice Period: Immediate

Job Description Data Quality Engineer (812 Years) | Location - Bangalore

Role Summary

We are seeking a hands-on Data Quality Engineer to design, implement, and operate automated data quality controls across an AWS + Databricks Lakehouse platform. This role ensures trusted data from ingestion (streaming/batch) through curated (Gold) layers by implementing quality rules, validation frameworks, monitoring, and remediation workflows aligned to business and governance standards.

Key Responsibilities

Data Quality Strategy & Rule Implementation

Define and implement data quality dimensions (accuracy, completeness, timeliness, consistency, uniqueness, validity) across Bronze/Silver/Gold datasets.
Partner with business/data owners to translate requirements into DQ rules, thresholds, and acceptance criteria.
Maintain a DQ rule repository and ensure versioning, traceability, and approvals.

Automation & Frameworks

Build and operationalize automated data checks using:
Databricks (PySpark, Spark SQL) and/or AWS Glue (PySpark) jobs
DQ frameworks such as Great Expectations, Deequ, or custom rule engines
Embed quality gates into pipelines (pre/post checks, quarantine patterns, fail-fast vs warn policies).
Create reusable DQ components (rule templates, test suites, profiling modules).

Monitoring, Alerting & Incident Management

Set up DQ monitoring dashboards (Databricks dashboards / CloudWatch / third-party observability).
Configure alerting for threshold breaches and anomalies (schema drift, outliers, volume spikes, null surges).
Perform root cause analysis and lead remediation with pipeline owners and source system teams.

Metadata, Lineage & Governance Alignment

Contribute DQ metadata into the data catalog/governance tool (e.g., Atlan) including rule coverage, scorecards, and dataset certification.
Support lineage-driven quality impact analysis and audit readiness.

Quality Reporting & Scorecards

Publish and maintain data quality scorecards by domain/dataset (daily/weekly) with trends and SLA adherence.
Track issue backlog, triage priority, and resolution SLAs.

Must-Have Skills & Experience

812 years in Data Engineering / Data Quality Engineering, with strong focus on DQ automation.
Strong hands-on with Databricks (PySpark, Spark SQL, Delta/Iceberg concepts).
Strong on AWS data ecosystem: S3, Glue (Catalog/Jobs), IAM, CloudWatch, KMS.
Experience validating Iceberg/Parquet datasets and managing schema evolution/schema drift.
Proficiency in Python, SQL, and writing testable data transformation logic.
Experience in batch and/or streaming validation patterns (micro-batch/near real-time).
Exposure to CI/CD for data pipelines (Azure DevOps/GitHub/Jenkins) and automated test execution.
Solid understanding of data modeling and canonical layer patterns.

Nice-to-Have

Experience with Unity Catalog, Databricks Workflows, Lakehouse Monitoring, DBSQL.
Experience with Power BI semantic models and ensuring quality for reporting datasets.
Familiarity with Data Observability tools (Monte Carlo/Databand/Bigeye etc.) or building internal equivalents.
Knowledge of MDM, data governance, and policy-driven controls.
Domain experience: Manufacturing/Supply Chain/ERP/ServiceNow/Workday type sources.

Qualifications