
Search by job, company or skills
Role: Freelance Gen AI Testing QA Evaluation Engineer (Senior)
Company Name : WillWare Technologies
Work Model : Remote / Contract / Fulltime
Experience : 5+ Years
Work Location : Chennai/Bangalore/Kochi/Jaipur/Coimbatore/Remote
Job description:
Experience:
• 5-8 years in QA automation, with 1-3 years in GenAI / API-based testing.
Key Responsibilities:
• Develop and maintain automated evaluation pipelines.
• Implement evaluation scripts using Python frameworks (e.g., DeepEval, custom frameworks)
• Integrate LLM/Chatbot APIs and agent workflows into evaluation pipelines
• Execute dataset-driven evaluations and capture and process responses.
• Support manual test scenario execution and validation
• Assist in dataset creation and enrichment
• Generate evaluation reports, and logs
• Debug and troubleshoot execution issues.
• Enable CI/CD integration for continuous evaluation.
Key Skills :
Core GenAI Evaluation Skills:
• Experience with evaluation frameworks (e.g., DeepEval or Arize)
• Understanding of LLM-as-a-Judge (G-Eval) methodology
• Strong prompt engineering and evaluation design skills
• Experience in manual evaluation of LLM outputs.
Technical Skills:
• Strong programming in Python
• Experience in API testing and integration
• Proficiency in JSON handling, parsing, and data processing
• Automation framework development/integration.
• Knowledge of logging, reporting, and debugging tools
Agent Manual Testing & Dataset Skills:
• Experience in:
o Test scenario creation for GenAI use cases
o Manual validation of LLM responses (qualitative assessment)
o Dataset creation and curation
o Writing expected outputs or golden answers.
• Ability to design edge cases, negative scenarios and adversarial inputs (prompt injection, jailbreaks)
Domain & QA Skills:
• Strong foundation in software testing principles:
o Functional, integration, regression testing
• Experience in test design, defect tracking, and reporting.
• Strong analytical and problem-solving skills.
• Conversational AI testing experience.
• Understanding of AI agent behavior, workflows, and edge cases.
Job ID: 146997885