Designation: Sr. Data Scientist
Work Location: Hyderabad (Hybrid Model)
Experience: 5+ Years
Core Responsibilities- Multimodal Enrollment Forecasting: Build hierarchical models that forecast Top-Down (Country/Study) and Bottom-Up (Site-level) enrollment by ingesting real-time screening logs from IRT and site-activation milestones from CTMS.
- Discontinuation & Attrition Modeling: Implement Survival Analysis (Cox-PH, DeepSurv) and RNNs to predict patient dropout probability using longitudinal data from EDC, which serves as the primary driver of Maintenance Phase demand.
- Demand vs. Supply Optimization: Develop Monte Carlo simulations or Stochastic Optimization models to determine safety stock levels, balancing the variance between predicted enrollment and actual inventory on hand.
- Dose Titration Logic: Build predictive ML models to anticipate dose escalations or reductionssyncing with IRT dispensing data to ensure the correct kit strength is available at the site before the patient's next visit.
- Clinical Data Lake Management:
- Architect unified data pipelines that join EDC (clinical outcomes/visit data) with IRT (supply/randomisation data).
- Manage the full ML lifecycle (Tracking, Registry, Serving) to ensure model reproducibility.
- Build resilient, real-time pipelines for monitoring supply-demand signals and triggering automated alerts for potential stock-outs.
Required Technical Expertise- Systems Integration: Proven experience processing and feature-engineering data from EDC (e.g., Medidata Rave, Veeva) and IRT/RTSM platforms.
- Advanced ML Domains:
- Time-Series: DeepAR, Temporal Fusion Transformers (TFT), or N-BEATS for non-linear recruitment trends.
- Survival Analysis: Expert-level experience modeling Time-to-Event data to handle censored patient discontinuation patterns.
- Probabilistic Programming: Experience with PyMC or Gurobi/OR-Tools to solve the Supply vs. Demand constraint problem.
- Data Engineering: Expert-level Python, SQL, and distributed computing for processing large-scale, high-velocity clinical datasets.
Clinical Domain Knowledge (Preferred To Have)
- Clinical Systems: Deep understanding of the data schemas within IRT/RTSM (Randomisation/Dispensing) and EDC (Patient Visits/Adverse Events).
- Supply Dynamics: Understanding of Initial Seeding, Trigger-based Resupply, and Dose Titration within a global trial context.
- Regulatory Context: Experience working within GxP / CFR Part 11 compliant environments, ensuring model auditability.
- Standards: Knowledge of CDISC (SDTM/ADaM) data structures is a significant plus.
Skills: ai/ml,python,data science