Search by job, company or skills

  • Posted 16 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description: AI QA Engineer

Experience: 7+ Years

Location: Gurugram

Work Mode: Hybrid - 3 Days WFO

Job Summary

We are seeking an AI QA Engineer to ensure the quality, accuracy, and performance of our

enterprise-grade Natural Language to SQL (NL2SQL) pipeline. You will be responsible for

validating a complex, multi-stage AI architectureincluding semantic routing, LLM-based

disambiguation, and query generationensuring it securely and accurately translates user

intent into valid queries within the BFSI domain.

Key Responsibilities

  • LLM & Pipeline Evaluation: Design and execute automated evaluations for a 4-stage

NL2SQL pipeline using LangSmith. Monitor metrics such as structural F1, execution

accuracy, latency, and token cost.

  • Dataset Management: Create, curate, and maintain benchmark/golden datasets for

continuous regression testing of LLM prompts and model outputs.

  • Search & Retrieval Testing: Validate precision and recall trade-offs in semantic search and

schema discovery, ensuring optimal candidate selection for downstream query generation.

  • Failure Analysis & Debugging: Perform root cause analysis across pipeline stages (routing,

disambiguation, query generation, execution), identifying issues such as schema mismatches,

type/coercion errors, runtime incompatibilities, and query structure failures.

  • E2E & API Automation: Develop automated test scripts using Python (Pytest) for

backend API testing and Playwright for the React frontend, validating end-to-end user

workflows.

  • Observability & Debugging: Utilize Grafana and structured JSONL logs to identify

pipeline bottlenecks, LLM hallucinations, or prompt degradation.

  • Compliance & Security: Ensure the AI pipeline meets strict BFSI data security standards,

validating execution safety mechanisms (e.g., runtime capability probing, injection

prevention); Ability to design validation rules and guardrails for AI pipelines to prevent invalid

query generation and runtime failures.

Required Skills

  • AI/LLM Testing: Experience testing LLM applications, RAG (Retrieval-Augmented

Generation) pipelines, or NLP models. Familiarity with AI evaluation frameworks (e.g.,

LangSmith, DeepEval, or similar).

  • Languages: Strong proficiency in Python 3.12+ (crucial for integrating with the existing AI

backend and Pytest suite). Secondary experience with JavaScript/TypeScript.

  • Test Automation: Expertise in API testing (REST) and optional UI automation using

Playwright.

  • Data & Search: Understanding of Vector Databases (e.g., Milvus, Pinecone) and semantic

search concepts (embeddings, hybrid search).

  • Data & SQL Validation: Solid understanding of SQL and data validation techniques to

verify correctness of complex query outputs.

  • Tools & Infrastructure: Git, Docker, CI/CD pipelines, and observability tools

(Prometheus/Grafana).

Education

  • BE / BTech / MCA / BSc in Computer Science, Data Science, or a related field.

Nice to Have

  • Familiarity with Graph Databases (Neo4j) and LangGraph orchestration.
  • Experience evaluating foundational LLM models (OpenAI, Anthropic, Google).
  • Prior exposure to query languages like SQL or PURE or any other functional programming

language.

  • Experience testing workflows across multiple services or pipelines, with an understanding

of failure handling, retries, and system reliability concepts.

  • Experience in Banking, Financial Services, or Insurance domains
  • Understanding of data security, compliance, and enterprise

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 144956323