Search by job, company or skills

HCLTech

AI Data Engineer

6-12 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Requirement Details

Primary Location:

Noida

Position Overview (Job Summary):

A highly technical Data Engineering role focused on designing, building, and operationalizing data systems that support AI/ML production pipelines. The position centers on ingesting unstructured and structured data, building high-scale automated pipelines, implementing feature stores, supporting vector databases, and enabling MLOps workflows for reproducible AI model development.

Primary Skills:

  • Python (expert-level)
  • SQL (advanced, tuning)
  • Apache Spark / PySpark
  • Apache Kafka (streaming)
  • ETL/ELT pipeline development
  • Feature Stores (Tecton, Feast)
  • Vector Databases (Pinecone, Milvus)
  • Cloud services: AWS Glue, Azure Data Factory, Google Vertex AI
  • Handling unstructured/semi-structured data (Parquet, JSON, Avro, text)
  • Data pipeline orchestration (Airflow)
  • Delta Lake / Lakehouse architectures

Secondary Skills:

  • Hugging Face Datasets
  • PyTorch / TensorFlow Data Loaders
  • dbt (Data Build Tool)
  • NoSQL (MongoDB, Cassandra)
  • Distributed computing frameworks (Flink)
  • Data quality automation & unit testing
  • MLOps integration: data versioning, lineage
  • AI/ML pipeline collaboration with data scientists

Experience:

  • 6 to 12+ years in Data Engineering
  • Minimum 2 years supporting production-grade AI/ML pipelines
  • Band: 3.1 to 4.2

Role and Responsibilities:

A. Key Responsibilities

  • Build robust, automated ETL/ELT pipelines for AI-ready datasets.
  • Perform feature engineering: cleaning, normalizing, and structuring complex data.
  • Develop and maintain Feature Stores to support both training and real-time inference.
  • Manage distributed, large-scale (petabyte-level) data processing using Spark/Flink.
  • Populate, index, and optimize vector databases for Generative AI/RAG workloads.
  • Implement data quality checks, unit tests, and bias detection mechanisms.
  • Support MLOps workflows: data versioning, lineage, reproducibility.
  • Collaborate closely with ML Engineers and Data Scientists for model development.

B. Additional Responsibilities

  • Work cross-functionally within Digital Foundation teams.
  • Ensure pipeline scalability, performance optimization, and automation maturity.
  • Prevent training-serving skew through structured data management practices.
  • Provide infrastructure support enabling rapid model training and deployment.
  • Contribute to best practices in AI data engineering and cloud-native architectures.

Educational Qualification:

  • Bachelor's or Master's degree in:
  • Computer Science
  • Information Systems
  • Engineering
  • Or related technical field

Certifications:

(Not mandatory but beneficial; JD does not list specifics)

  • Cloud certifications (AWS/Azure/GCP)
  • Databricks/Spark certifications
  • MLOps / ML engineering certifications
  • Kafka, Airflow, or dbt certifications

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 144011887

Similar Jobs