Terminal Bench Expert (Freelancer)

deccan ai experts

Mumbai, India

3-5 Years

Save

Posted 13 days ago
Be among the first 10 applicants

Early Applicant

Job Description

Overview:

Design and build challenging, real-world terminal-based tasks for evaluating frontier AI agents. Tasks must be genuinely difficult, clearly specified, and programmatically verifiable.

Responsibilities:

Design high-quality task ideas rooted in real-world workflows (debugging, infra setup, data pipelines, security, ML training, etc.)
Write clear, unambiguous task instructions with defined end states
Build Docker environments and write oracle solutions that pass all tests
Write deterministic pytest-based verification scripts
Identify edge cases and ensure tasks can't be shortcut or gamed by AI agents
Iterate with reviewers based on QC and platform gate feedback

Must-Haves:

3–5+ years of hands-on engineering experience in at least one domain (SWE, DevOps, ML, security, data engineering, scientific computing)
Proficiency in Python and shell scripting (bash)
Comfortable writing Dockerfiles, building images, and debugging containers
Experience writing automated tests (pytest, unittest)
Familiarity with Git workflows (PRs, diffs, branching)
Strong technical writing - ability to produce precise, unambiguous specifications

Nice-to-Haves: