AI Data Engineer

HCLTech

Noida, India

6-12 Years

Save

Posted 2 days ago
Be among the first 10 applicants

Early Applicant

Job Description

Requirement Details

Primary Location:

Noida

Position Overview (Job Summary):

A highly technical Data Engineering role focused on designing, building, and operationalizing data systems that support AI/ML production pipelines. The position centers on ingesting unstructured and structured data, building high-scale automated pipelines, implementing feature stores, supporting vector databases, and enabling MLOps workflows for reproducible AI model development.

Primary Skills:

Python (expert-level)
SQL (advanced, tuning)
Apache Spark / PySpark
Apache Kafka (streaming)
ETL/ELT pipeline development
Feature Stores (Tecton, Feast)
Vector Databases (Pinecone, Milvus)
Cloud services: AWS Glue, Azure Data Factory, Google Vertex AI
Handling unstructured/semi-structured data (Parquet, JSON, Avro, text)
Data pipeline orchestration (Airflow)
Delta Lake / Lakehouse architectures

Secondary Skills:

Hugging Face Datasets
PyTorch / TensorFlow Data Loaders
dbt (Data Build Tool)
NoSQL (MongoDB, Cassandra)
Distributed computing frameworks (Flink)
Data quality automation & unit testing
MLOps integration: data versioning, lineage
AI/ML pipeline collaboration with data scientists

Experience:

6 to 12+ years in Data Engineering
Minimum 2 years supporting production-grade AI/ML pipelines
Band: 3.1 to 4.2

Role and Responsibilities:

A. Key Responsibilities

Build robust, automated ETL/ELT pipelines for AI-ready datasets.
Perform feature engineering: cleaning, normalizing, and structuring complex data.
Develop and maintain Feature Stores to support both training and real-time inference.
Manage distributed, large-scale (petabyte-level) data processing using Spark/Flink.
Populate, index, and optimize vector databases for Generative AI/RAG workloads.
Implement data quality checks, unit tests, and bias detection mechanisms.
Support MLOps workflows: data versioning, lineage, reproducibility.
Collaborate closely with ML Engineers and Data Scientists for model development.

B. Additional Responsibilities

Work cross-functionally within Digital Foundation teams.
Ensure pipeline scalability, performance optimization, and automation maturity.
Prevent training-serving skew through structured data management practices.
Provide infrastructure support enabling rapid model training and deployment.
Contribute to best practices in AI data engineering and cloud-native architectures.

Educational Qualification: