Job Designation: Swarm Bench Task Engineer — Reasoning / Math role
Experience: 5-10 Years
Work Mode: Remote
Freelancing Role
Role Overview
We are seeking a highly analytical and computationally proficient individual to join our team with a strong research background. You will be instrumental in contributing to this role by either crafting challenging and insightful problems in your respective research domain, devising elegant computational solutions.
Responsibilities:
- Build multi-agent benchmark tasks that require multi-step mathematical reasoning, proof construction, or algorithmic problem-solving
- Design problems that are genuinely hard for a single agent but decomposable — competition math, numerical analysis, combinatorial optimization, statistical inference
- Create verification scripts that check mathematical correctness — numerical answers with appropriate tolerance, proof step validity, algorithm output correctness
- Write clear problem statements with precise notation, definitions, and output format
- Create decomposition guides that split problems into independent sub-computations or parallel solution strategies
Required Qualifications:
- 5+ years in mathematics, quantitative research, or computational science — competition math, university-level mathematics, or quantitative research background Python programming — NumPy, SciPy, or symbolic computation (SymPy) Experience writing mathematical proofs or formal derivations.
- Ability to create problems with precise, verifiable answers — not subjective or open-ended.
- Experience with AI coding benchmarks (SWE-bench, Terminal-bench) Comfortable with Docker — writing Dockerfiles, building images, and debugging container issues.
- Understanding of numerical methods — floating point tolerance, convergence criteria, error bounds.
Strong plus:
- Experience creating math competition problems (AMC, AIME, Putnam, IMO, or similar).
- Research in mathematics, theoretical CS, or quantitative fields.
- Experience with automated theorem proving or formal verification.
- Knowledge of AI reasoning benchmarks (GSM8K, MATH, AIME, GPQA, ARC-AGI).
- Experience with large-scale numerical computation or scientific computing
Example of what you will produce:
A task requiring analysis of a system of 50 coupled differential equations modeling a chemical reaction network. The agent must determine equilibrium concentrations, stability conditions, and bifurcation points. Input includes the reaction network as a matrix, rate constants, and initial conditions. The verifier checks numerical answers with tolerance (1e-6), validates eigenvalue analysis for stability, and confirms bifurcation parameter ranges. The decomposition splits into 4 sub-agents: one computes equilibria, one analyzes local stability, one maps bifurcations, and one synthesizes the phase portrait. Oracle scores 1.0, single-agent scores 0.25, multi-agent scores 0.80.
Perks of Freelancing With Turing
- Work on cutting-edge AI projects with leading foundation model companies
- Collaborate on high-impact work at the frontier of LLM evaluation and reasoning
- Remote, flexible opportunities with global teams
Offer Details
- Commitments Required: 8 hours per day with a 4-hour overlap with PST.
- Employment Type: Contractor position (Note: this role does not include medical/paid leave).
- Duration of Contract: 4 weeks; [expected start date is next week].