Role Mission
Own functional test automation of our internal AI workbench — a feature-heavy web application where many users author and manage agents, prompts, SOPs, tools, testing surfaces, and analytics dashboards. The workbench grows fast and is interaction-heavy; conventional handwritten UI automation will not keep up. You build an AI-assisted automation strategy — using AI agents to drive browser flows, generate test data, maintain selectors, and triage failures — so functional coverage scales with the product.
Key Responsibilities
- Design and own the functional test automation strategy for the workbench — covering agent management, prompt and SOP authoring, tool registration, testing surfaces, analytics dashboards, user/role management, and all major user journeys.
- Build AI-assisted UI automation: LLM-driven agents that generate, execute, and maintain Playwright (or equivalent) tests; that adapt selectors as the UI evolves; and that triage failures into actionable bug reports rather than red flakes.
- Build synthetic test data generators — agents, prompts, SOPs, tool definitions, conversation traces — to seed workbench environments and exercise dashboards under realistic data shapes.
- Validate the analytics layer end-to-end — events emitted from the runtime, ingested by the analytics pipeline, surfaced in workbench dashboards. Catch silent data-loss and aggregation bugs early.
- Own API-level functional tests for the workbench backend, alongside the UI suite — and pick the right level for each test.
- Maintain CI/CD test gates for the workbench and ensure that workbench deploys do not break the runtime systems the workbench controls.
Must-Haves
- 5–8 years in QA / SDET / quality engineering with strong test-automation craft on web applications.
- Strong frontend automation fluency — Playwright preferred — and the judgement to know when not to test through the UI.
- Strong API and integration testing — REST, GraphQL, event-driven systems.
- AI-assisted test automation experience required — you have used (or built convincing prototypes with) LLM-driven Playwright/Selenium, browser-use, computer-use, OpenAI Agents SDK, or comparable tooling to author or maintain functional tests. You can articulate where these techniques pay off and where they don't.
- Experience testing LLM / GenAI or AI-platform applications required — you have tested products where the underlying domain involves prompts, agents, tools, or eval workflows, and you understand how those concepts surface in a UI.
- Strong Python and TypeScript / JavaScript.
- CI/CD discipline — GitHub Actions / GitLab CI / similar — and a track record of test suites that stay green and fast.
- Data-pipeline / analytics testing fundamentals — event correctness, aggregation correctness, dashboard validation.