Mercor is seeking PhD holders and doctoral candidates in STEM disciplines with strong scientific programming experience to join a high-impact AI research initiative in partnership with a leading AI lab.
This role involves evaluating and enhancing large language models (LLMs) by applying subject-matter expertise in scientific coding to rigorously assess model performance in STEM-specific programming tasks.
Key Responsibilities
- Evaluate the accuracy, logic, and domain relevance of Python-based code and reasoning generated by LLMs in tasks related to graduate-level science.
- Review outputs across physics, chemistry, biology, and other scientific domains involving numerical methods, data analysis, modeling, and simulation.
- Complete structured evaluations for each task, answering 810 rubric questions and providing short written comments explaining each judgment.
- Identify code correctness issues, scientific inaccuracies, and gaps in technical execution.
- Work independently using provided evaluation tools and rubrics in an asynchronous, remote setting.
You're a strong fit if you have:
- A PhD (or are currently a PhD candidate) in Physics, Chemistry, Biology, Engineering, or a related STEM field.
- Strong experience using Python in scientific contexts (e.g. research, modeling, or applied analysis).
- Familiarity with scientific libraries such as NumPy, SciPy, pandas, or similar.
- A strong ability to critically evaluate both scientific reasoning and code implementation.
- Excellent written communication skills and attention to technical detail.
- Comfort working independently and asynchronously in a remote environment.
Role Details
- Part-time (1020 hours/week) with flexible scheduling.
- 100% remote and asynchronous work from anywhere, anytime.
Compensation & Legal
- Contractor position via Mercor, paid hourly.
- Competitive rates based on domain expertise, ranging from $20 to $35/hour.
- Weekly payments processed securely through Stripe Connect.
About Mercor
Mercor is a San Francisco-based company specializing in connecting elite professionals with cutting-edge AI initiatives. Our investors include Benchmark, General Catalyst, Adam D'Angelo, Larry Summers, and Jack Dorsey. We help leading AI labs accelerate progress by bringing in top-tier human expertise.
We consider all qualified applicants without regard to legally protected characteristics and provide reasonable accommodations upon request.