AI QA Engineer

swits digital private limited

Gurugram, Gurugram, India

7-9 Years

Save

Posted 16 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Description: AI QA Engineer

Experience: 7+ Years

Location: Gurugram

Work Mode: Hybrid - 3 Days WFO

Job Summary

We are seeking an AI QA Engineer to ensure the quality, accuracy, and performance of our

enterprise-grade Natural Language to SQL (NL2SQL) pipeline. You will be responsible for

validating a complex, multi-stage AI architectureincluding semantic routing, LLM-based

disambiguation, and query generationensuring it securely and accurately translates user

intent into valid queries within the BFSI domain.

Key Responsibilities

LLM & Pipeline Evaluation: Design and execute automated evaluations for a 4-stage

NL2SQL pipeline using LangSmith. Monitor metrics such as structural F1, execution

accuracy, latency, and token cost.

Dataset Management: Create, curate, and maintain benchmark/golden datasets for

continuous regression testing of LLM prompts and model outputs.

Search & Retrieval Testing: Validate precision and recall trade-offs in semantic search and

schema discovery, ensuring optimal candidate selection for downstream query generation.

Failure Analysis & Debugging: Perform root cause analysis across pipeline stages (routing,

disambiguation, query generation, execution), identifying issues such as schema mismatches,

type/coercion errors, runtime incompatibilities, and query structure failures.

E2E & API Automation: Develop automated test scripts using Python (Pytest) for

backend API testing and Playwright for the React frontend, validating end-to-end user

workflows.

Observability & Debugging: Utilize Grafana and structured JSONL logs to identify

pipeline bottlenecks, LLM hallucinations, or prompt degradation.

Compliance & Security: Ensure the AI pipeline meets strict BFSI data security standards,

validating execution safety mechanisms (e.g., runtime capability probing, injection

prevention); Ability to design validation rules and guardrails for AI pipelines to prevent invalid

query generation and runtime failures.

Required Skills

AI/LLM Testing: Experience testing LLM applications, RAG (Retrieval-Augmented

Generation) pipelines, or NLP models. Familiarity with AI evaluation frameworks (e.g.,

LangSmith, DeepEval, or similar).

Languages: Strong proficiency in Python 3.12+ (crucial for integrating with the existing AI

backend and Pytest suite). Secondary experience with JavaScript/TypeScript.

Test Automation: Expertise in API testing (REST) and optional UI automation using

Playwright.

Data & Search: Understanding of Vector Databases (e.g., Milvus, Pinecone) and semantic

search concepts (embeddings, hybrid search).

Data & SQL Validation: Solid understanding of SQL and data validation techniques to

verify correctness of complex query outputs.

Tools & Infrastructure: Git, Docker, CI/CD pipelines, and observability tools

(Prometheus/Grafana).

Education

BE / BTech / MCA / BSc in Computer Science, Data Science, or a related field.

Nice to Have

Familiarity with Graph Databases (Neo4j) and LangGraph orchestration.
Experience evaluating foundational LLM models (OpenAI, Anthropic, Google).
Prior exposure to query languages like SQL or PURE or any other functional programming

language.