AI Evaluation Engineer

People Prime Worldwide

India

5-7 Years

Save

Posted 2 days ago
Be among the first 10 applicants

Early Applicant

Job Description

About Company:

Our client is a Palo Alto–based AI infrastructure and talent platform founded in 2018. It helps companies connect with remote software developers using AI-powered vetting and matching technology. Originally branded as the Intelligent Talent Cloud,enabled companies to spin up their engineering dream team in the cloud by sourcing and managing vetted global talent. In recent years, they have evolved to support AI infrastructure and AGI workflows, offering services in model training, fine-tuning, and deployment—powered by their internal AI platform, ALAN, and backed by a vast talent network. They reported $300 million in revenue and reached profitability. Their growth is driven by demand for annotated training data from AI labs, including major clients like OpenAI, Google, Anthropic, and Meta.

Job Description:

Job Title: Agentic Coding Annotator - Online / Offline Tasks

Location: Pan India

Experience: 5+ yrs.

Employment Type: Contract to hire

Work Mode: Remote

Notice Period: - Immediate joiners

Requirements

Software Engineering Fluency (Mandatory)

5+ years of experience in software engineering, QA, developer tooling, data/ML engineering, or similar code-heavy roles
Strong hands-on experience in at least 1–2 programming languages or ecosystems
Representative languages include :Python, JavaScript/TypeScript, Rust, Java, C/C++, Bash/CLI environments, Haskell, Swift, SQL, or other production-relevant ecosystems
Ability to: Read and understand unfamiliar codebases Run and interpret tests, scripts, and CLI tools Debug issues and reason about edge cases or partial fixes Evaluate whether an implementation is functionally correct

Terminal & Tooling Skills (Mandatory)

Comfortable working in Linux/Ubuntu-like environments
Proficient with:Terminal workflows Git basics Code editors or IDEs Package managers and test runners JSON, YAML, and Markdown
Familiarity with Docker and reproducible environments (strong plus, especially for offline work)

Coding-Agent Workflow Familiarity (Mandatory)

Comfortable working with or quickly adapting to agentic coding environments, such as:OpenCode, Claude Code, Cursor, Similar coding-agent tools

Quality Judgment & Annotation Accuracy (Mandatory)

Ability to:

Compare multiple model trajectories and identify meaningful differences
Distinguish correctness from style, communication quality, and agent behavior
Evaluate solutions consistently using defined rubrics
Follow detailed process instructions without deviation
Maintain consistency across repeated or similar evaluations
Write concise, evidence-based rationales (not generic summaries)

Work Style

Highly detail-oriented and process-driven
Comfortable with repetitive, high-precision evaluation work
Able to maintain consistency across long tasks and multiple model runs
Proactively flags ambiguity instead of making assumptions
Balances realism with strict evaluation consistency

Additional Preferred Qualifications (Offline / Senior Candidates)

Strong Docker skills and experience building/debugging reproducible environments
Experience working in large, complex repositories (not just small or greenfield projects)
Demonstrated originality and sound engineering judgment in defining technical problems
Ability to design realistic, non-trivial tasks that go beyond tutorials, README flows, or simple bug fixes