In this vital role, you will design, build, and maintain data lake solutions for scientific data that drive business decisions for Research. You will build scalable and high-performance data engineering solutions for large scientific datasets and collaborate with Research customers.
Responsibilities
- Design, develop, and implement data pipelines, ETL/ELT processes, and data integration solutions.
- Take ownership of data pipeline projects from inception to deployment, including scope, timelines, and risks.
- Develop and maintain data models for biopharma scientific data, data dictionaries, and other documentation to ensure data accuracy and consistency.
- Optimize large datasets for query performance.
- Collaborate with global multi-functional teams, including research scientists, to understand data requirements and design solutions that meet business needs.
- Implement data security and privacy measures to protect sensitive data.
- Leverage cloud platforms (AWS preferred) to build scalable and efficient data solutions.
- Collaborate with Data Architects, Business SMEs, Software Engineers, and Data Scientists to design and develop end-to-end data pipelines to meet fast-paced business needs across geographic regions.
- Identify and resolve complex data-related challenges.
- Adhere to standard methodologies for coding, testing, and designing reusable code/component.
- Explore new tools and technologies that will help to improve ETL platform performance.
- Participate in sprint planning meetings and provide estimations on technical implementation.
- Maintain comprehensive documentation of processes, systems, and solutions.
Basic Qualifications
- Master's degree with 4 - 6 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology/Bioinformatics, or related field.
- OR
- Bachelor's degree with 6 - 8 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology/Bioinformatics, or related field.
- OR
- Diploma with 10 - 12 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology/Bioinformatics, or related field.
Preferred Qualifications
- 3+ years of experience in designing and supporting biopharma scientific research data pipelines.
Must-Have Skills
- Proficiency in SQL and Python for data engineering, test automation frameworks (pytest), and scripting tasks.
- Hands-on experience with big data technologies and platforms, such as Databricks, Apache Spark (PySpark, SparkSQL), workflow orchestration, and performance tuning on big data processing.
- Excellent problem-solving skills and the ability to work with large, complex datasets.
Good-to-Have Skills
- A passion for tackling complex challenges in drug discovery with technology and data.
- Experience writing and maintaining technical documentation in Confluence.
Professional Certifications
- Databricks Certified Data Engineer Professional (preferred).