Search by job, company or skills

millionlogics

AI Agent Evaluation Engineer (with Python Skills)

5-7 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 10 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Company Description

MillionLogics is a global IT solutions provider and a trusted Oracle Partner, combining innovative technology with strategic vision to empower organizations. With headquarters in London, UK, and a development hub in Hyderabad, India, MillionLogics specializes in Data & AI, Cloud Solutions, IT Consulting, and Oracle Cloud and database technologies. Backed by a team of over 50 Oracle experts, the company is committed to delivering tailored, results-driven solutions for digital transformation. At MillionLogics, the focus is on unlocking business potential with scalable, future-ready IT services. Learn more at https://millionlogics.com.

Role Description

We are seeking experienced Task Engineers — Data Analysis to design and develop high-quality multi-agent benchmark tasks that evaluate the analytical reasoning, coordination, and execution capabilities of advanced AI systems.

In this role, you will build realistic benchmark tasks that require AI agents to analyze large, messy, multi-source datasets, decompose work across specialist sub-agents, and arrive at specific, verifiable conclusions. These tasks may involve structured and semi-structured data such as CSVs, JSON files, logs, reports, survey results, vendor assessments, or financial and operational documents.

Your work will help measure how effectively AI systems perform complex analytical workflows involving cross-referencing, contradiction detection, anomaly identification, and statistical reasoning across multiple data sources.

Offer Details:

  • Pay: INR 90,000-1,00,000 per month (Net/take-home)
  • Mode of work - Fully Remote

What does Day to Day like:

  • Design and author multi-agent benchmark tasks centered on complex data analysis workflows
  • Create realistic synthetic datasets or curate real-world style datasets across domains such as finance, operations, security, or market analysis
  • Build tasks that require agents to perform cross-referencing, anomaly detection, contradiction identification, and statistical computation across multiple sources
  • Develop decomposition guides that split analytical work across specialist sub-agents such as financial, technical, security, or operations analysts
  • Write precise oracle logic or verification scripts that validate specific analytical conclusions rather than generic summaries
  • Create reproducible evaluation environments using Python and Docker
  • Review task performance signals to ensure strong separation between weaker and stronger agentic systems
  • Refine tasks to improve determinism, clarity, difficulty, and scoring quality

Requirements:

  • 5+ years of experience in data analysis
  • Strong proficiency in SQL and Python for data analysis and scripting (pandas, NumPy, or similar)
  • Experience working with real-world, messy datasets (CSV, JSON, logs, reports)
  • Ability to design non-trivial analytical questions with clear, specific, and verifiable answers
  • Solid understanding of statistical concepts (averages, distributions, outliers, correlations)
  • Familiarity with AI coding benchmark environments (e.g., SWE-bench, Terminal-Bench)
  • Comfortable working with Docker (writing Dockerfiles, building images, debugging containers)

Offer Details:

  • Commitments Required: 8 hours per day with a 4-hour overlap with PST.
  • Employment Type: Contractor position (Note: this role does not include medical/paid leave).
  • Duration of Contract: 4 weeks; [expected start date is next week].

How to Apply

Please send us your updated CV to [Confidential Information] with job ID 75113 in the email subject line.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 146703797