Search by job, company or skills

deccan ai experts

Terminal Bench Expert (Freelancer)

Save
  • Posted 13 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Overview:

Design and build challenging, real-world terminal-based tasks for evaluating frontier AI agents. Tasks must be genuinely difficult, clearly specified, and programmatically verifiable.

Responsibilities:

  • Design high-quality task ideas rooted in real-world workflows (debugging, infra setup, data pipelines, security, ML training, etc.)
  • Write clear, unambiguous task instructions with defined end states
  • Build Docker environments and write oracle solutions that pass all tests
  • Write deterministic pytest-based verification scripts
  • Identify edge cases and ensure tasks can't be shortcut or gamed by AI agents
  • Iterate with reviewers based on QC and platform gate feedback

Must-Haves:

  • 3–5+ years of hands-on engineering experience in at least one domain (SWE, DevOps, ML, security, data engineering, scientific computing)
  • Proficiency in Python and shell scripting (bash)
  • Comfortable writing Dockerfiles, building images, and debugging containers
  • Experience writing automated tests (pytest, unittest)
  • Familiarity with Git workflows (PRs, diffs, branching)
  • Strong technical writing - ability to produce precise, unambiguous specifications

Nice-to-Haves:

  • Experience with AI coding benchmarks (SWE-bench, Terminal-Bench, GPQA)
  • Open-source contributions or GitHub PR history in relevant repos
  • Experience with the Harbor evaluation framework
  • Background in competitive programming or Kaggle
  • Domain depth in niche areas (kernel dev, cryptography, HPC, media processing)
  • Masters or PhD in CS is preferred

Engagement:

  • Fully remote
  • Fixed rate per accepted task: $40 - $60 + performance based bonus

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 149172577