Job Purpose
The Data Curation Developer focuses on curating data to produce high-quality data assets for R&D analysis. This role involves ensuring datasets meet analysis-ready and privacy requirements, integrating diverse datasets, and collaborating with various teams to support GSK's Disease Area Strategies and other key R&D priority areas
Desired Skills And Experience
- with strong R skills (strongly aligned to the GSK Data science software developer role)
- Proven ability to handle and process large datasets efficiently, ensuring data privacy.
- Proficiency in handling structured, semi-structured, and unstructured data while ensuring data privacy.
- Understanding of data governance principles and practices with a focus on data privacy.
- Experience in complex batch processing, Azure Data Factory, Databricks, Airflow, Delta Lake, PySpark, Pandas, and other Python dataframe libraries.
- Proven ability to collaborate with cross-functional teams.
- Strong communication skills to present curated data.
- Expertise to translate business needs into technical data requirements and processes.
- Proven ability to quantify and provide insights to business impact and value creation from data curation activities.
Key Responsibilities
- Ensure all datasets meet analysis-ready and privacy requirements by performing necessary data curation activities (e.g., pre-process, contextualize, and/or anonymize).
- Process datasets to meet conditions mentioned in approved data re-use requests (e.g., remove subjects from countries that do not allow re-use).
- Write clean, readable code and ensure deliverables are appropriately quality controlled and documented.
- Integrate diverse datasets (e.g., clinical trials, real-world data, omics) into a unified format for consistent analysis.
- Provide coaching and peer review to ensure the team's work reflects industry best practices for data curation activities, including data privacy and anonymization standards.
- Lead the development of business requirements for data curation through collaboration with R&D business and data platform teams.
- Maintain strong connections with analytical groups and R&D Data Platform teams to ensure seamless data integration and usage
- Education: Bachelor's degree or equivalent