The Data Analyst for Scaled Evaluations is a key technical owner responsible for the integrity, quality, and analytical insights of large-scale human-generated datasets used to train and evaluate Machine Learning models. This role involves querying massive datasets, building visualizations to monitor rater performance, and identifying systemic issues in evaluation workflows. This role will serve as the bridge between raw operational data and engineering actionable insights, ensuring high-quality inputs for Search and AI model development.
Core Responsibilities
- Data Querying & Management: Write complex SQL queries (PLX, BigQuery) to extract, transform, and analyze evaluation data. Manage data pipelines to ensure the smooth flow of evaluation tasks (JSON, CSV) to vendors and the ingestion of results back into the system.
- Quality Metrics & Dashboarding: Design and maintain dynamic dashboards (Looker, Tableau, or similar) to track core Operational KPIs: Rater Throughput, Average Handling Time (AHT), and Quality Scores. Analyze Inter-Rater Reliability (IRR) and Golden Set performance to identify discrepancies and ambiguous rater guidelines.
- Root Cause Analysis: Perform deep-dive analysis on low-quality data batches. Identify whether issues stem from rater error, tooling bugs, or ambiguous guidelines, and proactively identify edge cases that may degrade model performance.
- Golden Set & Training Strategy: Use data insights to curate Golden Sets (Ground Truth data) used for testing rater accuracy. Collaborate with Program Managers to refine training materials based on data-driven identification of common rater errors.
- Content Safety Governance: Monitor datasets for sensitive or offensive content anomalies, ensuring robust filtering and adherence to safety guidelines.
Minimum Qualifications
- Bachelor's degree in a quantitative field (Data Science, Statistics, Computer Science) or equivalent practical experience.
- Proficiency in SQL: Ability to write joins, window functions, and aggregate data for reporting.
- Experience with data visualization tools (Looker, Tableau, PowerBI).
- Strong analytical skills with the ability to translate raw data into strategic operational recommendations.
- Comfortable handling sensitive