Search by job, company or skills

TIGI HR Solution

AI QA Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 6 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Summary

We are seeking an AI QA Engineer to ensure the quality, accuracy, and performance of our enterprise-grade Natural Language to SQL (NL2SQL) pipeline. You will be responsible for validating a complex, multi-stage AI architecture—including semantic routing, LLM-based disambiguation, and query generation—ensuring it securely and accurately translates user intent into valid queries within the BFSI domain.

Experience: 7+ Years

Location: Gurugram

Work Mode: Hybrid - 3 Days WFO

Employment Type: Full-Time

Key Responsibilities

  • LLM & Pipeline Evaluation: Design and execute automated evaluations for a 4-stage NL2SQL pipeline using LangSmith. Monitor metrics such as structural F1, execution accuracy, latency, and token cost.
  • Dataset Management: Create, curate, and maintain benchmark/golden datasets for continuous regression testing of LLM prompts and model outputs.
  • Search & Retrieval Testing: Validate precision and recall trade-offs in semantic search and schema discovery, ensuring optimal candidate selection for downstream query generation.
  • Failure Analysis & Debugging: Perform root cause analysis across pipeline stages (routing, disambiguation, query generation, execution), identifying issues such as schema mismatches, type/coercion errors, runtime incompatibilities, and query structure failures.
  • E2E & API Automation: Develop automated test scripts using Python (Pytest) for backend API testing and Playwright for the React frontend, validating end-to-end user workflows.
  • Observability & Debugging: Utilize Grafana and structured JSONL logs to identify pipeline bottlenecks, LLM hallucinations, or prompt degradation.
  • Compliance & Security: Ensure the AI pipeline meets strict BFSI data security standards, validating execution safety mechanisms (e.g., runtime capability probing, injection prevention); Ability to design validation rules and guardrails for AI pipelines to prevent invalid query generation and runtime failures.

Required Skills

  • AI/LLM Testing: Experience testing LLM applications, RAG (Retrieval-Augmented Generation) pipelines, or NLP models. Familiarity with AI evaluation frameworks (e.g., LangSmith, DeepEval, or similar).
  • Languages: Strong proficiency in Python 3.12+ (crucial for integrating with the existing AI backend and Pytest suite). Secondary experience with JavaScript/TypeScript.
  • Test Automation: Expertise in API testing (REST) and optional UI automation using Playwright.
  • Data & Search: Understanding of Vector Databases (e.g., Milvus, Pinecone) and semantic search concepts (embeddings, hybrid search).
  • Data & SQL Validation: Solid understanding of SQL and data validation techniques to verify correctness of complex query outputs.
  • Tools & Infrastructure: Git, Docker, CI/CD pipelines, and observability tools (Prometheus/Grafana).

Education

  • BE / BTech / MCA / BSc in Computer Science, Data Science, or a related field.

Nice to Have

  • Familiarity with Graph Databases (Neo4j) and LangGraph orchestration.
  • Experience evaluating foundational LLM models (OpenAI, Anthropic, Google).
  • Prior exposure to query languages like SQL or PURE or any other functional programming language.
  • Experience testing workflows across multiple services or pipelines, with an understanding of failure handling, retries, and system reliability concepts.
  • Experience in Banking, Financial Services, or Insurance domains
  • Understanding of data security, compliance, and enterprise database schemas

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 145785267

Similar Jobs