Role Overview
Wiingy is seeking expert biologists with a background in data annotation to join our AI Model Assessment team, supporting the evaluation and improvement of frontier AI models in the biological sciences. As an AI Model Assessment Specialist, you will design rigorous test cases, evaluate model outputs, and provide expert-level feedback that directly informs how advanced AI systems reason through complex biology problems. Your contributions will play a critical role in benchmarking AI performance and ensuring scientific integrity across our platform.
Perks:
You will be assigned one of two task types:
- Assessment Authoring - Design and write original, expert-level biology questions and scenarios used to probe and evaluate AI model capabilities. Rate difficulty, provide gold-standard solutions, and submit for peer review.
- Assessment Verification - Review pre-written assessment items for scientific accuracy, clarity, and rigor. Evaluate AI-generated responses against gold-standard answers, document findings, and justify any edits made.
Biology Domains Covered
Ecology & Evolutionary Biology, Genetics, Cell Biology, Biomedical Science, Microbiology, Molecular Biology, Biochemistry, Neuroscience, and Miscellaneous Biology.
Key Responsibilities
Assessment Design
- Design original, challenging biology questions that probe deep conceptual understanding, multi-step reasoning, and experimental interpretation — not surface-level recall
- Ensure all assessment items are unambiguous, self-contained, and precisely defined, with all necessary information included in the problem statement
- Provide 1 correct answer and multiple plausible but subtly incorrect distractors that challenge expert-level solvers
- Write detailed Chain-of-Thought solutions with clear, logically sequenced intermediate steps in markdown format
- Supply 1–5 academic references per item from peer-reviewed journals or reputable university repositories
Model Evaluation & Annotation
- Systematically evaluate AI-generated biology responses against expert gold-standard answers
- Assess model outputs across multiple dimensions: factual accuracy, depth of reasoning, logical coherence, and scientific rigor
- Identify and document errors, hallucinations, reasoning gaps, and unsupported claims in AI responses
- Rate AI response quality using structured rubrics and provide actionable written feedback
- Apply annotation best practices — including label consistency, inter-rater calibration, and structured documentation — drawn from prior data annotation experience
- Flag edge cases, ambiguous outputs, or failure modes that require escalation
Ideal Qualifications
Academic Background
- PhD or doctoral candidate in Biology, Molecular Biology, Biochemistry, Neuroscience, or a closely related field
- Master's degree considered for candidates with exceptional depth and research experience in a specific subdomain
- Strong command of graduate-level biological concepts, experimental design, and scientific data interpretation
Data Annotation Experience
- Prior experience in data annotation, content labeling, or AI training data workflows is strongly preferred
- Familiarity with annotation platforms and tools such as Scale AI, Appen, Labelbox, Surge AI, or similar
- Hands-on experience applying structured annotation rubrics, style guides, or evaluation frameworks
- Understanding of inter-rater reliability, calibration workflows, and annotation quality metrics
- Experience annotating scientific, technical, or STEM-domain content is a significant advantage
- Prior work on AI model evaluation, RLHF (Reinforcement Learning from Human Feedback), or preference ranking tasks is a strong plus
More About the Opportunity
- Fully remote and asynchronous - work on your own schedule, from anywhere
- Expected commitment: 10+ hours/week, with flexibility based on availability
- Ideal for PhD students, postdoctoral researchers, or early-career biology professionals with annotation experience seeking meaningful, flexible contract work
- Gain firsthand experience shaping the scientific capabilities of state-of-the-art AI frontier models
- Projects may be extended, shortened, or concluded early depending on platform needs and individual performance