Roles & Responsibilities:
- Design, develop, and implement data pipelines, ETL/ELT processes, and data integration solutions
- Contribute to data pipeline projects from inception to deployment, manage scope, timelines, and risks
- Contribute to data models for biopharma scientific data, data dictionaries, and other documentation to ensure data accuracy and consistency
- Optimize large datasets for query performance
- Collaborate with global multi-functional teams including research scientists to understand data requirements and design solutions that meet business needs
- Implement data security and privacy measures to protect sensitive data
- Leverage cloud platforms (AWS preferred) to build scalable and efficient data solutions
- Collaborate with Data Architects, Business SMEs, Software Engineers and Data Scientists to design and develop end-to-end data pipelines to meet fast paced business needs across geographic regions
- Identify and resolve data-related challenges
- Adhere to best practices for coding, testing, and designing reusable code/component
- Explore new tools and technologies that will help to improve ETL platform performance
- Participate in sprint planning meetings and provide estimations on technical implementation
- Maintain documentation of processes, systems, and solutions
Skills:
- Proficiency in Python, RDF, SPARQL, Graph Databases (e.g. Allegrograph), SQL, relational databases, ETL pipelines, big data technologies (e.g. Databricks), semantic data standards (OWL, W3C, FAIR principles), ontology development and semantic modeling practices.
- Hands on experience with big data technologies and platforms, such as Databricks, workflow orchestration, performance tuning on data processing.
- Excellent problem-solving skills and the ability to work with large, complex datasets