SwarmBench Task Engineer

sourcebae

India

5-7 Years

Save

Posted 16 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

SwarmBench Task Engineer — SWE / Code

5+ years of experience in Python and JavaScript development

Experience with AI coding benchmarks (e.g., SWE-bench, Terminal-Bench)

Strong experience reading and navigating large open-source codebases (e.g., Django, Flask, FastAPI, Node.js, or similar)

Familiarity with Git workflows, including pull requests, diffs, cherry-picking, and working with specific commits

Comfortable with Docker — writing Dockerfiles, building images, and debugging container issues.

Role Overview

We are looking for experienced SwarmBench Task Engineers — Code / SWE to design and build high-quality multi-agent benchmark tasks based on real-world software engineering workflows.

In this role, you will create tasks grounded in real open-source code changes such as bug fixes, migrations, and refactors. These tasks are used to evaluate how effectively AI agents can understand large codebases, apply precise modifications, and produce correct, testable outputs.

You will work within a structured evaluation framework (Harbor), define clear task instructions, design verification logic, and decompose complex engineering problems across multiple specialized agents.

What does day-to-day look like

Build multi-agent benchmark tasks based on real-world open-source code changes (bug fixes, migrations, refactors)

Work with the Harbor evaluation framework to run and validate tasks inside Docker environments

Write clear, precise task instructions specifying file paths, function signatures, expected behavior, and constraints

Design and implement Python-based verification scripts to validate correctness of agent-generated code changes

Create decomposition strategies that split complex code changes across multiple independent sub-agents

Run, debug, and refine tasks within containerized environments to ensure reproducibility and determinism

Evaluate task performance signals and improve task quality, clarity, and difficulty

Requirements

5+ years of experience in Python and JavaScript development

Experience with AI coding benchmarks (e.g., SWE-bench, Terminal-Bench)

Strong experience reading and navigating large open-source codebases (e.g., Django, Flask, FastAPI, Node.js, or similar)

Familiarity with Git workflows, including pull requests, diffs, cherry-picking, and working with specific commits

Comfortable working with Docker (writing Dockerfiles, building images, debugging container issues)

Experience writing test scripts (pytest, unittest, or custom assertion-based testing)

Ability to write clear, precise, and unambiguous technical specifications

If Intrested. Please submit your CV to [Confidential Information] or share it via WhatsApp at 8827565832

Stay updated with our latest job opportunities and company news by following us on LinkedIn: :https://www.linkedin.com/company/sourcebae

More Info

Job Type:

Industry:

Function:

Employment Type:

About Company

sourcebaeJob Source: www.linkedin.com

Job ID: 147072507

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 08-05-2026 08:41:46 AM

Homejobs in IndiaSwarmBench Task Engineer

Similar Jobs

Data Scientist-Artificial Intelligence

IBM

5-7 yrs

Bengaluru, India

AI Developer (Full Stack / Chat UI & LLM Systems)

Logic Planet

7-9 yrs

Bhubaneswar

Early Applicant

Do you want to see more relevant and perfect job for you?

Beware of Scammers

We don’t charge any money for job offers

What it feels like to have

48% more interview calls?

To get 5X more recruiter views on your profile