In this role, you will be responsible for constructing semantic data pipelines, integrating both relational and graph-based data sources, ensuring seamless data interoperability, and leveraging cloud platforms to scale data solutions effectively.
Roles & Responsibilities:
- Develop and maintain semantic data pipelines using Python, RDF, SPARQL, and linked data technologies.
- Develop and maintain semantic data models for biopharma scientific data
- Integrate relational databases (SQL, PostgreSQL, MySQL, Oracle, etc.) with semantic frameworks.
- Ensure interoperability across federated data sources, linking relational and graph-based data.
- Implement and optimize CI/CD pipelines using GitLab and AWS.
- Leverage cloud services (AWS Lambda, S3, Databricks, etc.) to support scalable knowledge graph solutions.
- Collaborate with global multi-functional teams, including research scientists, Data Architects, Business SMEs, Software Engineers, and Data Scientists to understand data requirements, design solutions, and develop end-to-end data pipelines to meet fast-paced business needs across geographic regions.
- Collaborate with data scientists, engineers, and domain experts to improve research data accessibility.
- Adhere to standard processes for coding, testing, and designing reusable code/components.
- Explore new tools and technologies to improve ETL platform performance.
- Participate in sprint planning meetings and provide estimations on technical implementation.
- Maintain comprehensive documentation of processes, systems, and solutions.
- Harmonize research data to appropriate taxonomies, ontologies, and controlled vocabularies for context and reference knowledge.
What we expect of youWe are all different, yet we all use our unique contributions to serve patients.
Basic Qualifications and Experience:
- Doctorate Degree OR Master s degree with 4 - 6 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology/Bioinformatics or related field OR
- Bachelor s degree with 6 - 8 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology/Bioinformatics or related field OR
- Diploma with 10 - 12 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology/Bioinformatics or related field
Preferred Qualifications and Experience:
- 6+ years of experience in designing and supporting biopharma scientific research data analytics (software platforms)
Functional Skills:
Must-Have Skills:
- Advanced Semantic and Relational Data Skills: Proficiency in Python, RDF, SPARQL, Graph Databases (e.g. Allegrograph), SQL, relational databases, ETL pipelines, big data technologies (e.g. Databricks), semantic data standards (OWL, W3C, FAIR principles), ontology development and semantic modeling practices.
- Cloud and Automation Expertise: Good experience in using cloud platforms (preferably AWS) for data engineering, along with Python for automation, data federation techniques, and model-driven architecture for scalable solutions.
- Technical Problem-Solving: Excellent problem-solving skills with hands-on experience in test automation frameworks (pytest), scripting tasks, and handling large, complex datasets.
- Good-to-Have Skills:Experience in biotech/drug discovery data engineering
- Experience applying knowledge graphs, taxonomy and ontology concepts in life sciences and chemistry domains
- Experience with graph databases (Allegrograph, Neo4j, GraphDB, Amazon Neptune)
- Familiarity with Cypher, GraphQL, or other graph query languages
- Experience with big data tools (e.g. Databricks)
- Experience in biomedical or life sciences research data management
- Soft Skills:Excellent critical-thinking and problem-solving skills
- Good communication and collaboration skills
- Demonstrated awareness of how to function in a team setting
- Demonstrated presentation skills