SwarmBench Task Engineer — Reasoning / Math role

Encore It Solutions

Remote

5-10 Years

20 - 25 LPA

Save

Posted an hour ago
Be among the first 10 applicants

Early Applicant

Quick Apply

Job Description

Job Designation: Swarm Bench Task Engineer — Reasoning / Math role

Experience: 5-10 Years

Work Mode: Remote

Freelancing Role

Role Overview

We are seeking a highly analytical and computationally proficient individual to join our team with a strong research background. You will be instrumental in contributing to this role by either crafting challenging and insightful problems in your respective research domain, devising elegant computational solutions.

Responsibilities:

Build multi-agent benchmark tasks that require multi-step mathematical reasoning, proof construction, or algorithmic problem-solving
Design problems that are genuinely hard for a single agent but decomposable — competition math, numerical analysis, combinatorial optimization, statistical inference
Create verification scripts that check mathematical correctness — numerical answers with appropriate tolerance, proof step validity, algorithm output correctness
Write clear problem statements with precise notation, definitions, and output format
Create decomposition guides that split problems into independent sub-computations or parallel solution strategies

Required Qualifications:

5+ years in mathematics, quantitative research, or computational science — competition math, university-level mathematics, or quantitative research background Python programming — NumPy, SciPy, or symbolic computation (SymPy) Experience writing mathematical proofs or formal derivations.
Ability to create problems with precise, verifiable answers — not subjective or open-ended.
Experience with AI coding benchmarks (SWE-bench, Terminal-bench) Comfortable with Docker — writing Dockerfiles, building images, and debugging container issues.
Understanding of numerical methods — floating point tolerance, convergence criteria, error bounds.

Strong plus:

Experience creating math competition problems (AMC, AIME, Putnam, IMO, or similar).
Research in mathematics, theoretical CS, or quantitative fields.
Experience with automated theorem proving or formal verification.
Knowledge of AI reasoning benchmarks (GSM8K, MATH, AIME, GPQA, ARC-AGI).
Experience with large-scale numerical computation or scientific computing

Example of what you will produce:

A task requiring analysis of a system of 50 coupled differential equations modeling a chemical reaction network. The agent must determine equilibrium concentrations, stability conditions, and bifurcation points. Input includes the reaction network as a matrix, rate constants, and initial conditions. The verifier checks numerical answers with tolerance (1e-6), validates eigenvalue analysis for stability, and confirms bifurcation parameter ranges. The decomposition splits into 4 sub-agents: one computes equilibria, one analyzes local stability, one maps bifurcations, and one synthesizes the phase portrait. Oracle scores 1.0, single-agent scores 0.25, multi-agent scores 0.80.

Perks of Freelancing With Turing

Work on cutting-edge AI projects with leading foundation model companies
Collaborate on high-impact work at the frontier of LLM evaluation and reasoning
Remote, flexible opportunities with global teams

Offer Details

Commitments Required: 8 hours per day with a 4-hour overlap with PST.
Employment Type: Contractor position (Note: this role does not include medical/paid leave).
Duration of Contract: 4 weeks; [expected start date is next week].