Job Description
MODEL VALIDATOR
Eval set development - ability to benchmark agent performance through reasoning paths
Adversarial testing - ability to break agent by giving it conflicting instructions etc
Scholastic regression testing - Measure variance in agent behavior.
Tool call validations - Agent calls the correct external APIs and databases
Ability to review thought chains and identify where agent logic diverged from BRD
Must have knowledge of applying judge LLMs to grade outputs
Python and framework - Proficiency in DeepEval, Langsmith etc
Ability to do semantic debugging - Look at agent's thought trace
Screening Criteria
SDETs as they have coding background to testing - They can develop evals
Good knowledge of data / SQL based testing etc
Domain background is added advantage for such roles