Senior AI Data Management, - Training & -Quality Engineer

Blue Cloud Softech Solutions Limited

Bengaluru, India

5-7 Years

Save

Posted 4 days ago
Be among the first 10 applicants

Early Applicant

Job Description

BU / FUNCTION DESCRIPTION

We are building a new AI Transformation center which will integrate into parts of ADITRAC (Accelerated Digital Transformation Center) and becoming our strategic advisory and technology partner for solving complex challenges in the journey of achieving your digital objectives. We work with all the business functions within Transportation Solutions and Sensors BU to drive Digitalization and create AI-driven solutions.

Our set-up is based on 3 main pillars to drive and deliver digitalization:

Consulting and digital / AI advisory: Partner with each function to better understand challenges and design state-of-the-art solutions and agents
AI Solutioning and Technology center: Mastering all technical disciplines and solutions
Project Management and Security: Ensuring delivery on time and budget and with the necessary security levels.

ROLE OBJECTIVE

AI performance is founded by data quality and data governance. This role ensures the team has the right data, with the right quality, with the right controls - so model outcomes are dependable and reliable. Own the end-to-end AI data lifecycle - from governed ingestion to training/evaluation datasets, data quality gates, lineage, reproducibility, and run-time monitoring - using AWS + Databricks as the production backbone. Guide and prepare the transformation of Sensors from Dashboard- to an AI-driven organization.

RESPONSIBILITIES

AI Data Strategy & Ownership (Operating Model)

Translate AI use cases into data requirements

Features, labels, context documents, metadata, refresh cadence, retention rules.
Define the AI data products needed for each solution (training set, evaluation set, inference inputs, reference corpora)
Develop and maintain an AI data roadmap aligned to the data product roadmap specific for Sensors BU

Develop a data-strategy to tranform from a data-dashboard oriented organization to an AI-first model

Collaborating with our DIA Dashboard organization (Philippine spoke team)
Develop a data-strategy for our Sensors internal databases (e.g. SBI)

Data Ingestion & Curation on AWS + Databricks

Build and operate robust ingestion pipelines from enterprise sources into AWS + Databricks:
Ensure data pipelines are:
Incremental (cost-aware)
Observed (metrics & logs)
Reliable (SLAs for freshness and completeness)

Establish BU-oriented AI Data Governance (Unity Catalog + AWS controls)

Leverage Databricks Unity Catalog for table, column, and row-level controls
Implement classification & handling standards
PII/PCI/Confidential tagging
Retention and deletion rules (e.g., right-to-delete)
Audit trails and access logging-

Define and maintain data contracts with source owners for schema, semantics, quality SLAs, and change processes

Data Quality Engineering (Hard Gates for AI Readiness)

Define data quality dimensions and SLAs (AI-specific):
Completeness, consistency, timeliness, uniqueness
Distribution stability (for drift-sensitive features)
Implement automated quality checks:
Schema validation (breaking changes)
Null/missingness thresholds
Referential integrity
Distribution checks (mean/variance, quantiles, KL divergence where appropriate)

Consider data quality dashboards & alerting:

Pipeline failures and/or data freshness breaches
Quality test failures (e.g. Block training or deployment when critical checks fail)

Performance & Cost Optimization (AWS + Databricks economics)

Optimize data storage and compute:
Partitioning strategies and file sizing
Delta optimization/compaction strategy
Cluster sizing, autoscaling, job scheduling
Ensure cost transparency

Production Operations & Support Readiness (Run Phase)

Provide operational artifacts and support:
Runbooks (pipeline recovery, backfills, reprocessing)
On-call / escalation participation for data incidents
Root cause analysis for quality issues
Ensure observability via SLAs/health checks for critical pipelines

EDUCATION/KNOWLEDGE

Bacholor degree: Computer Science, Software Engineering, Data Science, Artificial Intelligence / Machine Learning, Applied Mathematics or Engineering (with strong CS content)

QUALIFICATIONS & EXPERIENCE