Role Description:
In this role, you willdesign,buildandmaintain data lake solutions for scientific data that drive business decisions for Research. You willbuildscalable and high-performancedata engineering solutions forlarge scientific datasets and collaborate with Research stakeholders. The ideal candidatepossessesexperience in the pharmaceuticalorbiotech industry,demonstratesstrong technical skills, is proficient with big data technologies, and has a deep understanding of data architecture and ETL processes.
Roles & Responsibilities:
- Design, develop, and implement data pipelines, ETL/ELT processes, and data integration solutions
- Take ownership of data pipeline projects from inception to deployment, manage scope, timelines, and risks
- Develop and maintain data models for biopharma scientific data, data dictionaries, and other documentation to ensure data accuracy and consistency
- Optimize large datasets for query performance
- Collaborate with global cross-functional teams including research scientists to understand data requirements and design solutions that meet business needs
- Implement data security and privacy measures to protect sensitive data
- Leverage cloud platforms (AWS preferred) to build scalable and efficient data solutions
- Collaborate with Data Architects, Business SMEs, Software Engineers and Data Scientists to design and develop end-to-end data pipelines to meet fast paced business needs across geographic regions
- Identify and resolve [complex] data-related challenges
- Adhere to best practices for coding, testing, and designing reusable code/component
- Explore new tools and technologies that will help to improve ETL platform performance
- Participate in sprint planning meetings and provide estimations on technical implementation
- Maintain comprehensive documentation of processes, systems, and solutions
Basic Qualifications and Experience:
- Doctorate Degree OR
- Masters degree with 4 - 6 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology/Bioinformatics or related field OR
- Bachelors degree with 6 - 8 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology/Bioinformatics or related field OR
- Diploma with 10 - 12 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology/Bioinformatics or related field
Preferred Qualifications and Experience:
- 3+ years of experience in implementing and supporting biopharma scientific research data analytics (software platforms)
Functional Skills:
Must-Have Skills:
- Proficiency in SQL and Python for data engineering, test automation frameworks (pytest), and scripting tasks
- Hands on experience with big data technologies and platforms, such as Databricks, Apache Spark (PySpark, SparkSQL), workflow orchestration, performance tuning on big data processing
- Excellent problem-solving skills and the ability to work with large, complex datasets
Good-to-Have Skills:
- A passion for tackling complex challenges in drug discovery with technology and data
- Strong understanding of data modeling, data warehousing, and data integration concepts
- Strong experience using RDBMS (e.g.Oracle, MySQL, SQL server, PostgreSQL)
- Knowledge of cloud data platforms (AWS preferred)
- Experience with data visualization tools (e.g. Dash, Plotly, Spotfire)
- Experience with diagramming and collaboration tools such as Miro, Lucidchart or similar tools for process mapping and brainstorming
- Experience writing and maintaining technical documentation in Confluence
- Understanding of data governance frameworks, tools, and best practices
Professional Certifications:
- Databricks Certified Data Engineer Professional preferred
Soft Skills:
- Excellent critical-thinking and problem-solving skills
- Strong communication and collaboration skills
- Demonstrated awareness of how to function in a team setting
- Demonstrated presentation skills