About The Company
Based in San Francisco, California, Turing is the world's leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in coding, reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L.
About The Role
We are seeking experienced Data Analysts (MLE Bench) to join our team and contribute to benchmark-driven evaluation projects centered on real-world machine learning systems. This role involves hands-on analytical work with production-like datasets, metrics, and machine learning outputs to evaluate, diagnose, and enhance the performance of advanced AI systems. The ideal candidate will operate at the intersection of data analysis and machine learning, demonstrating strong analytical skills and the ability to work with real datasets and evaluation workflows. This position offers a unique opportunity to be involved in high-impact projects that push the boundaries of AI evaluation and performance assessment.
Qualifications
The ideal candidate will possess a minimum of 3+ years of experience as a Data Analyst or an analytics-focused engineer. Proficiency in Python for data analysis is essential, along with solid experience in SQL and working with relational datasets. Candidates should have a proven track record of analyzing machine learning outputs and evaluation metrics, with a strong understanding of statistics and analytical reasoning. The ability to work effectively with large, complex datasets and extract reliable insights is crucial. Additionally, candidates must demonstrate the ability to write clean, readable, and well-documented analytical code. Excellent spoken and written English communication skills are required to collaborate effectively with cross-functional teams.
Responsibilities
- Analyze structured and unstructured datasets generated from machine learning training, inference, and evaluation pipelines to identify patterns, anomalies, and areas for improvement.
- Define, compute, and validate metrics used to evaluate model performance and behavior, ensuring alignment with project goals and benchmarks.
- Investigate data distributions, model outputs, failure modes, and edge cases relevant to benchmark tasks to inform evaluation strategies.
- Write and run Python and SQL scripts to analyze data, generate reports, and support evaluation workflows, ensuring reproducibility and accuracy.
- Validate data quality, consistency, and correctness across datasets and experimental results to maintain integrity and reliability.
- Create clear, well-documented analytical artifacts and reproducible workflows that facilitate collaboration and future analysis.
- Collaborate closely with machine learning engineers and researchers to design challenging, real-world evaluation scenarios that accurately reflect operational conditions.
Benefits
Working with Turing as a freelancer offers the flexibility of a fully remote environment, enabling you to work from anywhere. You will have the opportunity to engage with cutting-edge AI projects alongside leading LLM companies, gaining valuable experience in the rapidly evolving AI landscape. This role provides a platform to expand your professional network, enhance your skill set, and contribute to innovative solutions that have a tangible impact on AI development and deployment.
Equal Opportunity
Turing is committed to fostering an inclusive environment and equal opportunity employment. We celebrate diversity and are dedicated to creating an equitable workplace where all individuals are valued and respected. We do not discriminate based on race, gender, age, religion, sexual orientation, disability, or any other protected characteristic. All qualified candidates are encouraged to apply and join us in advancing frontier AI research and applications.