Search by job, company or skills

Molecular Connections

Senior Manager / Lead Data Scientist (Clinical Data Standardization & ETL Operations)

new job description bg glownew job description bg glownew job description bg svg
  • Posted 6 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Position Overview

We are seeking an accomplished Senior Manager / Lead Data Scientist to lead a high-performing team of data scientists and engineers focused on clinical data standardization, ETL workflows, and regulatory-ready data products. This leadership role requires deep expertise in CDISC standards (SDTM, ADaM, TLFs), OMOP Common Data Model, and genomic variant data, combined with proven ability to guide technical teams, architect scalable ETL pipelines, and ensure regulatory compliance across real-world data (RWD), EHR systems, and clinical trial datasets.

The ideal candidate will drive the strategic direction of our clinical data operations, mentor a diverse team of data professionals, and serve as the primary technical authority for OMOP/SDTM transformations and regulatory submissions to agencies such as the FDA and PMDA.

Key ResponsibilitiesLeadership & Strategy

Lead, mentor, and develop a team of data scientists, data engineers, and analysts working on clinical data standardization and ETL workflows

Define and execute the technical roadmap for OMOP and CDISC-compliant data pipelines, ensuring alignment with business objectives and regulatory requirements

Foster a culture of technical excellence, continuous improvement, and collaborative problem-solving across multidisciplinary teams

Partner with senior leadership to shape data strategy for precision medicine, regulatory submissions, and real-world evidence generation

Drive adoption of best practices in metadata-driven automation, reproducible workflows, and quality assurance frameworks

Technical Architecture & Delivery

Design and oversee end-to-end ETL architectures for converting heterogeneous clinical, EHR, and real-world data sources into OMOP CDM, SDTM, ADaM, and TLF formats

Establish and maintain production-grade pipelines using open-source workflow orchestration tools (Airflow, Prefect, Nextflow, Luigi) and proprietary systems (SAS DI, Informatica, cloud-native platforms)

Champion the use of OHDSI tools (WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, Achilles, DataQualityDashboard) for OMOP transformations and quality validation

Ensure adherence to CDISC 360 metadata standards, Define.xml generation, controlled terminology management, and SDTM/ADaM conformance

Implement robust data quality, validation, and reconciliation processes across all stages of ETL, leveraging Pinnacle 21 and custom QC frameworks

Regulatory & Compliance

Serve as the subject matter expert for regulatory submission-ready datasets, ensuring timely and accurate delivery of SDTM/ADaM/TLFs to FDA, EMA, and PMDA

Collaborate with biostatistics, clinical operations, regulatory affairs, and quality assurance teams to meet submission timelines and compliance standards

Provide expert guidance on data privacy, security, and governance in alignment with HIPAA, GDPR, ICH GCP, and ISO 27001/27701 standards

Review and approve Define.xml, Reviewer's Guides, aCRFs, and other submission documentation for regulatory packages

Genomic & Variant Data Specialization

Lead initiatives for curating, harmonizing, and annotating genomic variant datasets from public and proprietary sources (ClinVar, ClinGen, HGMD, CADD, gnomAD, dbSNP, COSMIC, refSeq, REVEL)

Oversee ETL pipelines for mapping VCF annotation files to OMOP genomic tables and CDISC submission formats

Ensure quality control of variant annotations, reference genome build consistency (GRCh37/38), and adherence to HGVS nomenclature

Stay current with emerging variant annotation standards, genomic data formats (VCF, BED, GFF), and translational research methodologies

Stakeholder Engagement

Act as the primary liaison between technical teams, clinical operations, statistical programming, and external partners on data standards and interoperability

Translate complex technical challenges into business-friendly solutions and communicate risks, trade-offs, and opportunities to senior stakeholders

Represent the organization in industry forums, CDISC working groups, OHDSI community events, and regulatory interactions

Required QualificationsEducation

Ph.D. in Bioinformatics, Health Informatics, Computational Biology, Genomics, Biomedical Engineering, Clinical Data Science, or related quantitative field

M.S. with exceptional leadership track record and 7+ years of relevant experience may be considered

Experience

7+ years of progressive experience in clinical data science, bioinformatics, or health data engineering roles

3+ years in leadership or team lead capacity, managing cross-functional technical teams (data scientists, engineers, analysts)

Proven track record of delivering regulatory-ready SDTM/ADaM datasets for FDA/EMA/PMDA submissions

Deep hands-on experience with OMOP CDM and OHDSI ecosystem (WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, ATLAS, Achilles)

Extensive experience building and maintaining production ETL pipelines for clinical trials, RWD, EHR, and genomic data

Demonstrated expertise in CDISC standards (SDTM, ADaM) and associated documentation (Define.xml, Reviewer's Guides, aCRF)

Technical Skills (Core)

Programming & Scripting: Expert-level proficiency in Python, R, SQL; strong working knowledge of SAS (Base, Macro, Studio)

ETL & Workflow Orchestration: Hands-on experience with Airflow, Nextflow, Prefect, Luigi, dbt, or equivalent platforms

Clinical Data Standards: OMOP CDM, CDISC SDTM, ADaM, controlled terminologies (MedDRA, SNOMED CT, LOINC, RxNorm, ICD-10)

OHDSI Tools: WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, ATLAS, Achilles, DataQualityDashboard

Genomic Data: VCF, BED, GFF formats; reference genomes (GRCh37/38); HGVS nomenclature; variant annotation databases

Data Quality & Validation: Pinnacle 21, custom QC frameworks, automated testing, Define.xml validation

Cloud & Databases: SQL (PostgreSQL, MySQL, SQL Server), cloud platforms (AWS, GCP, Azure), data warehousing concepts

Version Control & DevOps: Git/GitHub/GitLab, CI/CD pipelines, Docker, Kubernetes (basic understanding)

Domain Knowledge

In-depth understanding of clinical trials, real-world evidence studies, precision medicine, and translational research

Knowledge of ontologies and controlled vocabularies (ClinVar terms, Sequence Ontology, HPO, OMIM)

Familiarity with cohort-building tools (ATLAS, i2b2, TriNetX) and EHR/claims data structures

Understanding of data harmonization, linkage, and interoperability across heterogeneous sources

Awareness of HL7 FHIR, DICOM, and other health data exchange standards

Leadership & Soft Skills

Proven ability to lead, mentor, and develop technical teams, with emphasis on coaching junior and mid-level data scientists

Strong strategic thinking and ability to translate business needs into technical solutions

Excellent communication and presentation skills, with experience presenting to executive leadership and regulatory authorities

Collaborative mindset, capable of working across functions (clinical, biostatistics, IT, regulatory, quality)

Problem-solving mentality, detail-oriented, and committed to data integrity and quality excellence

Fluent in English; additional languages a plus

Preferred Qualifications

Certifications: CDISC SDTM/ADaM training certification; HL7 FHIR Proficiency; AWS Certified Solutions Architect / GCP Professional Data Engineer / Azure Data Engineer Associate

Statistical Programming: Experience with SAS statistical procedures, double programming workflows, TLF shell development

NLP & AI: Exposure to natural language processing applications on clinical narratives, adverse event coding, or generative AI for SDTM/ADaM automation

Data Visualization: Proficiency in Tableau, Power BI, or custom dashboards (Plotly, Shiny) for stakeholder reporting

Keywords

OMOP, SDTM, ADaM, TLFs, CDISC, OHDSI, ETL, EHR, RWD, Clinical Trials, Regulatory Submissions, FDA, PMDA, WhiteRabbit, Rabbit-in-a-Hat, Pinnacle 21, Genomic Variants, VCF, HGVS, ClinVar, Python, R, SAS, SQL, Airflow, Nextflow, Define.xml, Data Quality, Team Leadership, Bioinformatics, Precision Medicine

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 143398897