Search by job, company or skills

E

SwarmBench Task Engineer — Reasoning / Math role

5-10 Years
20 - 25 LPA
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted an hour ago
  • Be among the first 10 applicants
Early Applicant
Quick Apply

Job Description

Job Designation: Swarm Bench Task Engineer — Reasoning / Math role

Experience: 5-10 Years

Work Mode: Remote

Freelancing Role

Role Overview

We are seeking a highly analytical and computationally proficient individual to join our team with a strong research background. You will be instrumental in contributing to this role by either crafting challenging and insightful problems in your respective research domain, devising elegant computational solutions.

Responsibilities:

  • Build multi-agent benchmark tasks that require multi-step mathematical reasoning, proof construction, or algorithmic problem-solving
  • Design problems that are genuinely hard for a single agent but decomposable — competition math, numerical analysis, combinatorial optimization, statistical inference
  • Create verification scripts that check mathematical correctness — numerical answers with appropriate tolerance, proof step validity, algorithm output correctness
  • Write clear problem statements with precise notation, definitions, and output format
  • Create decomposition guides that split problems into independent sub-computations or parallel solution strategies

Required Qualifications:

  • 5+ years in mathematics, quantitative research, or computational science — competition math, university-level mathematics, or quantitative research background Python programming — NumPy, SciPy, or symbolic computation (SymPy) Experience writing mathematical proofs or formal derivations.
  • Ability to create problems with precise, verifiable answers — not subjective or open-ended.
  • Experience with AI coding benchmarks (SWE-bench, Terminal-bench) Comfortable with Docker — writing Dockerfiles, building images, and debugging container issues.
  • Understanding of numerical methods — floating point tolerance, convergence criteria, error bounds.

Strong plus: 

  • Experience creating math competition problems (AMC, AIME, Putnam, IMO, or similar). 
  • Research in mathematics, theoretical CS, or quantitative fields. 
  • Experience with automated theorem proving or formal verification. 
  • Knowledge of AI reasoning benchmarks (GSM8K, MATH, AIME, GPQA, ARC-AGI). 
  • Experience with large-scale numerical computation or scientific computing

Example of what you will produce:

A task requiring analysis of a system of 50 coupled differential equations modeling a chemical reaction network. The agent must determine equilibrium concentrations, stability conditions, and bifurcation points. Input includes the reaction network as a matrix, rate constants, and initial conditions. The verifier checks numerical answers with tolerance (1e-6), validates eigenvalue analysis for stability, and confirms bifurcation parameter ranges. The decomposition splits into 4 sub-agents: one computes equilibria, one analyzes local stability, one maps bifurcations, and one synthesizes the phase portrait. Oracle scores 1.0, single-agent scores 0.25, multi-agent scores 0.80.

Perks of Freelancing With Turing

  • Work on cutting-edge AI projects with leading foundation model companies
  • Collaborate on high-impact work at the frontier of LLM evaluation and reasoning
  • Remote, flexible opportunities with global teams

Offer Details

  • Commitments Required: 8 hours per day with a 4-hour overlap with PST.
  • Employment Type: Contractor position (Note: this role does not include medical/paid leave).
  • Duration of Contract: 4 weeks; [expected start date is next week].

More Info

Function:
Employment Type:
Open to candidates from:
Indian

About Company

Job ID: 147077865

Similar Jobs