
Search by job, company or skills
Role Specification: Data Scientist
About the Role
As a Data Scientist within the, you will be at the core of the AI revolution, where your expertise fuels the development of the most advanced Large Language Models. Our team is the driving force behind high-quality data—the essential ingredient for unlocking unprecedented AI breakthroughs. In this role, you will define data excellence and make a tangible impact on the future of intelligent systems by partnering with Research, Engineering, and Product teams, including Cloud AI Data.
A primary focus of this position is the strategic guidance and rigorous validation of data produced by agentic taskers. You will be responsible for ensuring that human data generation (on agentic workflows) meets the highest standards of accuracy, nuance, and utility for model training.
Responsibilities
Agentic Oversight: Design and implement frameworks to guide, monitor, and validate the data outputs generated by agentic taskers to ensure they align with gold-standard benchmarks.
Methodology Development: Develop new methodologies to improve the performance of models through superior training data, including innovative approaches to data collection and insight generation.
AI Integration: Utilize AI models and tools as integral components for evaluating, synthesizing, and understanding complex datasets to drive data quality.
Cross-Functional Partnership: Act as a critical technical partner, collaborating closely with Research, Engineering, and Product teams to define data excellence across the organization.
Outcome Ownership: Solve ambiguous problems and influence stakeholders to ensure data intelligence outcomes directly support product and business objectives.
Preferred Qualifications
Advanced Academic Background: PhD degree in a quantitative field such as Computer Science, Statistics, Mathematics, or a related domain.
Extensive Analytical Experience: 10 years of experience using analytics to solve complex product or business problems, including querying databases and performing advanced statistical analysis.
Technical Proficiency: Mastery of coding languages such as Python, R, or SQL for data manipulation, modeling, and automation.
LLM & Agentic Expertise: Proven experience in evaluating, synthesizing, or validating datasets specifically for Large Language Models, with a focus on supervising automated or agent-driven workflows.
Strategic Leadership: Demonstrated ability to own high-level decision-making processes, solve ambiguous technical challenges, and influence diverse stakeholder groups.
Domain Depth: Deep expertise in AI data lifecycles, from initial collection and labeling strategy to final validation and model integration.
Job ID: 145514897