Search by job, company or skills

AHEAD

GenAI Data ETL Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 15 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help enterprises deliver on the promise of digital transformation.

At AHEAD, we prioritize creating a culture of belonging, where all perspectives and voices are represented, valued, respected, and heard. We create spaces to empower everyone to speak up, make change, and drive the culture at AHEAD.

We are an equal opportunity employer, and do not discriminate based on an individual's race, national origin, color, gender, gender identity, gender expression, sexual orientation, religion, age, disability, marital status, or any other protected characteristic under applicable law, whether actual or perceived.

We embrace all candidates that will contribute to the diversification and enrichment of ideas and perspectives at AHEAD.

We are seeking a GenAI Data Engineer Data Integration & Retrieval to design, build, and operate the data pipelines that power our LLMbased applications, agents, and analytics. This role sits at the intersection of data engineering and generative AI, with a focus on turning messy, distributed enterprise data into highquality context for retrievalaugmented generation (RAG), copilots, and intelligent automation.

You will partner closely with the Platform and Use Cases Teams, GenAI/ML engineers, and business stakeholders to deliver robust, observable, and futureproof data flows that keep us ahead of where the industry is going.

Key Responsibilities

GenAI / RAG Data Pipeline Development

  • Design, develop, and maintain ETL/ELT pipelines that ingest structured and unstructured data (databases, documents, tickets, logs, wikis, APIs, SaaS apps) into vector stores, search indexes, and feature tables that power GenAI use cases.
  • Implement document and record transformations including chunking, metadata enrichment, normalization, deduplication, and PII redaction for safe and highquality LLM context.
  • Build and evolve semantic data models that reflect how LLMs consume context (e.g., knowledge domains, entities, relationships, access controls) rather than only traditional star schemas.
  • Optimize pipelines for performance, reliability, and cost (incremental loads, CDC, partitioning, caching, adaptive refresh strategies) in support of lowlatency GenAI experiences.
  • Implement data quality checks and evaluations tailored to GenAI workloads (e.g., coverage of knowledge domains, freshness, retrieval accuracy, hallucination risk signals).

LLM & Integration Engineering

  • Design and implement systemtosystem integrations that consolidate context for GenAI from SaaS platforms and internal systems (CRM, ITSM/ticketing, ERP, knowledge bases, collaboration tools).
  • Work with GenAI engineers to wire data pipelines into LLM orchestration flows (e.g., RAG, tools/agents, workflows), ensuring clean interfaces and robust contracts.
  • Build and maintain prompt/response logging, retrieval traces, and feedback capture to enable experimentation, evaluation, and continuous improvement.
  • Ensure integrations and pipelines are secure, auditable, and compliant, including access controls, row/columnlevel permissions, and policydriven redaction for LLM consumption.
  • Collaborate with application and platform teams to define SLAs, schemas, and APIs for data contracts that support GenAI services.

Operations, Monitoring, and Documentation

  • Set up scheduling, orchestration, and workflow management for GenAI data pipelines (e.g., Airflow, Prefect, Dagster, cloudnative orchestrators).
  • Implement observability for data and retrieval: pipeline health, data freshness, vector store/index stats, retrieval coverage, and failure modes that impact LLM behavior.
  • Diagnose and resolve pipeline and integration issues, performing rootcause analysis across data sources, transformations, and downstream GenAI applications.
  • Maintain clear documentation of data flows, lineage, schemas, mappings, and runbooks, with a focus on how they support specific GenAI use cases.
  • Partner with data governance and architecture to enforce naming standards, lineage, and metadata practices that enable safe and explainable GenAI.

Education

  • Minimum Required: Bachelor's degree in Computer Science, Information Systems, or similar

Skills Required

  • 5+ years of experience in data engineering, ETL/ELT development, or data integration roles.
  • Strong SQL skills (complex joins, window functions, performance tuning) across analytical and operational workloads.
  • Handson experience with at least one modern data pipeline / transformation framework (e.g., dbt, Airflow/Prefect/Dagster, cloudnative ETL, or custom Python/SQL pipelines).
  • Experience building and maintaining data pipelines on cloud data platforms (e.g., Snowflake, BigQuery, Redshift, Synapse, or equivalent).
  • Proficiency in Python (preferred) or another programming language commonly used in data workflows (e.g., Java, Scala), including working with APIs and JSON.
  • Experience working with REST APIs, webhooks, JSON, CSV, and other common integration formats.
  • Solid understanding of data modeling and integration concepts (relational modeling, denormalization, CDC, eventdriven or logbased ingestion).
  • Familiarity with version control (Git) and standard software engineering practices (code review, branching strategies, CI/CD basics).
  • Demonstrated exposure whether in personal or work projects to LLMs / GenAI (personal projects, pilots, or production systems).

Preferred Skills

  • Experience with LLMcentric data patterns, such as retrievalaugmented generation (RAG), semantic search, or document intelligence.
  • Handson experience with vector databases or search technologies (e.g., Pinecone, Weaviate, pgvector, OpenSearch, Elasticsearch, Vespa).
  • Experience with workflow orchestration tools (e.g., Apache Airflow, Prefect, Dagster, Azure Data Factory, AWS Glue workflows).
  • Exposure to messagebased or streaming integrations (e.g., Kafka, Kinesis, Pub/Sub, EventBridge) for near realtime data and event feeds into GenAI systems.
  • Experience in data quality and observability (e.g., Great Expectations, Monte Carlo, Soda, or custom checks/alerts).
  • Knowledge of at least one cloud platform (AWS, Azure, GCP) and its data/AI services (e.g., object storage, serverless compute, managed warehouses, managed LLMs or embeddings).
  • Familiarity with security and compliance concepts: data classification, encryption, access controls, secrets management, and safe handling of PII/regulated data.

Nice to Have

  • Experience partnering with ML/GenAI teams, including feature pipelines, evaluation datasets, or MLOps practices.
  • Experience with BI / analytics tools (e.g., Power BI, Tableau, Looker) and understanding how analytical needs intersect with GenAI use cases.
  • Background with data catalogs, lineage tools, or knowledge graphs that help organize enterprise knowledge for GenAI.

Why AHEAD

Through our daily work and internal groups like Moving Women AHEAD and RISE AHEAD, we value and benefit from diversity of people, ideas, experience, and everything in between.

We fuel growth by stacking our office with top-notch technologies in a multi-million-dollar lab, by encouraging cross department training and development, sponsoring certifications and credentials for continued learning.

India Employment Benefits Include

Comprehensive health insurance coverage for employees, with options to extend coverage to dependents

Paid time off and company holidays, along with additional leave benefits as per policy

Flexible work arrangements, supporting work-life balance

Learning and development opportunities to support continuous growth and upskilling

Employee wellness initiatives and programs focused on physical and mental well-being

Retirement and statutory benefits in line with India regulations

Inclusive and people-first culture, with a strong focus on collaboration and ownership

More Info

Job Type:
Industry:
Employment Type:

About Company

AHEAD

Job ID: 144459315