CLINICAL DATA ENGINEER
The Clinical Data Engineering team provides strategic planning, integrating, execution, build and oversight of clinical trial deliverables. The CDE leads the integration, design, development, and execution of data pipelines for the ingestion of clinical data from all sources at an enterprise level at the study level. The CDE is an enterprise level role and is primarily responsible for ensuring smooth end to end processes for data collection/ingestion from all data collection sources, providing an output into a data lake that is fit for use by downstream end users.
Accountabilities
- Serve as a technical expert in building data pipelines for the ingestion and delivery of clinical data at the study level, supporting study start- up, conduct, and close-out activities.
- lead the planning, execution, and delivery of data pipeline initiatives across clinical trials.
- Develop robust data pipelines for integrating heterogeneous data sources.
- Identify, design, and implement scalable data delivery solutions, automating manual processes whenever possible.
- Provide technical leadership in all aspects of clinical data flow, including defining, building, and validating application programming interfaces (APIs), data streams, and data staging for extraction and integration across systems.
- Manage, maintain, and troubleshoot pipelines within data lakes or data warehouses, ensuring ongoing reliability and quality.
- Develop and implement comprehensive data integrity and quality checks throughout the data ingestion process.
- Prepare functional areas for submission readiness and represent the Clinical Data Engineering group in formal inspections or audits.
- Represent Takeda in interactions with key external partners.
- Design and build infrastructure for optimal data extraction, transformation, and loading (ETL/ELT) using cloud platforms such as AWS or Azure.
- Collaborate with downstream usersincluding statistical programmers, SDTM programmers, analytics, and clinical data programmersto ensure deliverables meet end-user requirements.
- Appropriately escalate issues to CDE leadership as needed.
Job Description
Education & Competencies (Technical and Behavioral):
- BS/BA required in a health-related, life science area or technology-related fields.
- Minimum of 5 years technology experience.
- May lead study level negotiation and agreement for data transfer or integration on behalf of Sponsor.
- should be able to function collaboratively (with some guidance) with all levels of employees
- Critical thinking
Technical/Functional (Line) Expertise
- Proficient with Python, SQL, and NoSQL databases.
- Hands-on cloud experience with AWS/Azure/GCP.
- Hands-on experience with big data processing frameworks like Apache Spark
- Familiarity with any one of GitLab, GitHub, and Jenkins for version control and CI/CD.
- Proven expertise in deploying data pipelines in cloud environments and scheduling it using Airflow.
- Skilled in setting up and managing data warehouses and data lakes (e.g., Snowflake, Amazon Redshift).
- Efficient in designing, developing, and maintaining scalable data pipelines for large datasets.
- Strong understanding of database concepts, with working knowledge of XML, JSON, and API integrations.
- Solid experience applying System Development Life Cycle (SDLC)
Skills: snowflake,api,sdtm,jenkins,amazon redshift,data lakes,clinical trial,etl,ci/cd