
Search by job, company or skills

Position Overview
We are seeking an accomplished Senior Manager / Lead Data Scientist to lead a high-performing team of data scientists and engineers focused on clinical data standardization, ETL workflows, and regulatory-ready data products. This leadership role requires deep expertise in CDISC standards (SDTM, ADaM, TLFs), OMOP Common Data Model, and genomic variant data, combined with proven ability to guide technical teams, architect scalable ETL pipelines, and ensure regulatory compliance across real-world data (RWD), EHR systems, and clinical trial datasets.
The ideal candidate will drive the strategic direction of our clinical data operations, mentor a diverse team of data professionals, and serve as the primary technical authority for OMOP/SDTM transformations and regulatory submissions to agencies such as the FDA and PMDA.
Key ResponsibilitiesLeadership & Strategy
Lead, mentor, and develop a team of data scientists, data engineers, and analysts working on clinical data standardization and ETL workflows
Define and execute the technical roadmap for OMOP and CDISC-compliant data pipelines, ensuring alignment with business objectives and regulatory requirements
Foster a culture of technical excellence, continuous improvement, and collaborative problem-solving across multidisciplinary teams
Partner with senior leadership to shape data strategy for precision medicine, regulatory submissions, and real-world evidence generation
Drive adoption of best practices in metadata-driven automation, reproducible workflows, and quality assurance frameworks
Technical Architecture & Delivery
Design and oversee end-to-end ETL architectures for converting heterogeneous clinical, EHR, and real-world data sources into OMOP CDM, SDTM, ADaM, and TLF formats
Establish and maintain production-grade pipelines using open-source workflow orchestration tools (Airflow, Prefect, Nextflow, Luigi) and proprietary systems (SAS DI, Informatica, cloud-native platforms)
Champion the use of OHDSI tools (WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, Achilles, DataQualityDashboard) for OMOP transformations and quality validation
Ensure adherence to CDISC 360 metadata standards, Define.xml generation, controlled terminology management, and SDTM/ADaM conformance
Implement robust data quality, validation, and reconciliation processes across all stages of ETL, leveraging Pinnacle 21 and custom QC frameworks
Regulatory & Compliance
Serve as the subject matter expert for regulatory submission-ready datasets, ensuring timely and accurate delivery of SDTM/ADaM/TLFs to FDA, EMA, and PMDA
Collaborate with biostatistics, clinical operations, regulatory affairs, and quality assurance teams to meet submission timelines and compliance standards
Provide expert guidance on data privacy, security, and governance in alignment with HIPAA, GDPR, ICH GCP, and ISO 27001/27701 standards
Review and approve Define.xml, Reviewer's Guides, aCRFs, and other submission documentation for regulatory packages
Genomic & Variant Data Specialization
Lead initiatives for curating, harmonizing, and annotating genomic variant datasets from public and proprietary sources (ClinVar, ClinGen, HGMD, CADD, gnomAD, dbSNP, COSMIC, refSeq, REVEL)
Oversee ETL pipelines for mapping VCF annotation files to OMOP genomic tables and CDISC submission formats
Ensure quality control of variant annotations, reference genome build consistency (GRCh37/38), and adherence to HGVS nomenclature
Stay current with emerging variant annotation standards, genomic data formats (VCF, BED, GFF), and translational research methodologies
Stakeholder Engagement
Act as the primary liaison between technical teams, clinical operations, statistical programming, and external partners on data standards and interoperability
Translate complex technical challenges into business-friendly solutions and communicate risks, trade-offs, and opportunities to senior stakeholders
Represent the organization in industry forums, CDISC working groups, OHDSI community events, and regulatory interactions
Required QualificationsEducation
Ph.D. in Bioinformatics, Health Informatics, Computational Biology, Genomics, Biomedical Engineering, Clinical Data Science, or related quantitative field
M.S. with exceptional leadership track record and 7+ years of relevant experience may be considered
Experience
7+ years of progressive experience in clinical data science, bioinformatics, or health data engineering roles
3+ years in leadership or team lead capacity, managing cross-functional technical teams (data scientists, engineers, analysts)
Proven track record of delivering regulatory-ready SDTM/ADaM datasets for FDA/EMA/PMDA submissions
Deep hands-on experience with OMOP CDM and OHDSI ecosystem (WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, ATLAS, Achilles)
Extensive experience building and maintaining production ETL pipelines for clinical trials, RWD, EHR, and genomic data
Demonstrated expertise in CDISC standards (SDTM, ADaM) and associated documentation (Define.xml, Reviewer's Guides, aCRF)
Technical Skills (Core)
Programming & Scripting: Expert-level proficiency in Python, R, SQL; strong working knowledge of SAS (Base, Macro, Studio)
ETL & Workflow Orchestration: Hands-on experience with Airflow, Nextflow, Prefect, Luigi, dbt, or equivalent platforms
Clinical Data Standards: OMOP CDM, CDISC SDTM, ADaM, controlled terminologies (MedDRA, SNOMED CT, LOINC, RxNorm, ICD-10)
OHDSI Tools: WhiteRabbit, Rabbit-in-a-Hat, ETL-CDMBuilder, ATLAS, Achilles, DataQualityDashboard
Genomic Data: VCF, BED, GFF formats; reference genomes (GRCh37/38); HGVS nomenclature; variant annotation databases
Data Quality & Validation: Pinnacle 21, custom QC frameworks, automated testing, Define.xml validation
Cloud & Databases: SQL (PostgreSQL, MySQL, SQL Server), cloud platforms (AWS, GCP, Azure), data warehousing concepts
Version Control & DevOps: Git/GitHub/GitLab, CI/CD pipelines, Docker, Kubernetes (basic understanding)
Domain Knowledge
In-depth understanding of clinical trials, real-world evidence studies, precision medicine, and translational research
Knowledge of ontologies and controlled vocabularies (ClinVar terms, Sequence Ontology, HPO, OMIM)
Familiarity with cohort-building tools (ATLAS, i2b2, TriNetX) and EHR/claims data structures
Understanding of data harmonization, linkage, and interoperability across heterogeneous sources
Awareness of HL7 FHIR, DICOM, and other health data exchange standards
Leadership & Soft Skills
Proven ability to lead, mentor, and develop technical teams, with emphasis on coaching junior and mid-level data scientists
Strong strategic thinking and ability to translate business needs into technical solutions
Excellent communication and presentation skills, with experience presenting to executive leadership and regulatory authorities
Collaborative mindset, capable of working across functions (clinical, biostatistics, IT, regulatory, quality)
Problem-solving mentality, detail-oriented, and committed to data integrity and quality excellence
Fluent in English; additional languages a plus
Preferred Qualifications
Certifications: CDISC SDTM/ADaM training certification; HL7 FHIR Proficiency; AWS Certified Solutions Architect / GCP Professional Data Engineer / Azure Data Engineer Associate
Statistical Programming: Experience with SAS statistical procedures, double programming workflows, TLF shell development
NLP & AI: Exposure to natural language processing applications on clinical narratives, adverse event coding, or generative AI for SDTM/ADaM automation
Data Visualization: Proficiency in Tableau, Power BI, or custom dashboards (Plotly, Shiny) for stakeholder reporting
Keywords
OMOP, SDTM, ADaM, TLFs, CDISC, OHDSI, ETL, EHR, RWD, Clinical Trials, Regulatory Submissions, FDA, PMDA, WhiteRabbit, Rabbit-in-a-Hat, Pinnacle 21, Genomic Variants, VCF, HGVS, ClinVar, Python, R, SAS, SQL, Airflow, Nextflow, Define.xml, Data Quality, Team Leadership, Bioinformatics, Precision Medicine
Job ID: 143398897