About Strand Life Sciences
Strand is a 23-year-old spin-off from the Indian Institute of Science and one of India's foremost genomics companies, serving a global customer base across clinical diagnostics, pharma, and translational research. We build algorithms, data pipelines, and visualisations that help clinicians and researchers unlock biological insight from large-scale sequencing data. As next-generation sequencing becomes routine in clinical practice, the work done at Strand sits at the frontier of that transformation.
About the Role
We are looking for a rigorous, hands-on Bioinformatics Engineer to join a high-performing project team focused on somatic variant and copy-number analysis for oncology applications. The person stepping into this role is expected to match a high standard technically and in terms of documentation quality, cross-functional communication, and scientific rigour from day one. The core of the work is building, customising, and benchmarking pipelines for somatic SNV/indel and CNV detection from paired tumour-normal and tumour-only short-read sequencing data, with a strong emphasis on tumour purity estimation, focal CNV calling, QC framework design, and algorithm development for targeted consumer and clinical genomics workflows.
Key Responsibilities
Pipeline Development & Variant Calling
- Design, implement, and maintain production-grade somatic SNV/indel pipelines using tools such as Mutect2, Strelka2, VarScan2, and custom ensemble callers for both paired tumour-normal and tumour-only modes.
- Build and customize allele specific CNV calling workflows integrating (but not limited to) CNVkit, PureCN, GATK CNV, FREEC, AscatNGS, calibrating for panel, WES, and low-pass WGS modalities.
- Develop and refine tumour purity and ploidy estimation approaches; understand and implement WGS/WES-based purity inference using allele frequency distributions, copy-number fits, and cfDNA-specific adjustments where relevant.
- Extend and customize existing open-source tools (CNVkit, PureCN, Mutect2 filters) to meet project-specific QC thresholds, coverage requirements, and allele frequency sensitivity targets.
- Implement Nextflow DSL2 workflows for end-to-end pipeline orchestration in on-premise HPC environments.
Algorithm Development & Statistical Hypothesis Testing
- Formulate and test statistical hypotheses for new variant-calling and CNV segmentation algorithms targeted at consumer and clinical genomics applications, from power calculations and simulation to empirical validation on real datasets.
- Design and execute benchmarking studies evaluating tool sensitivity, specificity, LOD, and FDR across FFPE, fresh-frozen, and liquid biopsy samples using truth sets (SEQC2, Genome in a Bottle, synthetic mixtures).
- Develop statistical frameworks for QC metric thresholds like coverage uniformity, tumour fraction sufficiency, strand bias, base quality, and mapping artefacts, with pass/fail decision logic.
- Prototype and evaluate novel approaches for low-allele-frequency variant rescue, sub-clonal CNV detection, and artefact filtering in tumour-only settings.
Quality Control & Scientific Rigour
- Own QC frameworks at every pipeline stage: pre-alignment (FastQC, fastp), alignment (duplicate rates, on-target %, coverage uniformity), variant-level (VAF distributions, Ti/Tv ratios, strand bias), and report-level (concordance, reproducibility).
- Maintain comprehensive documentation for all pipelines, parameter choices, validation results, and known limitations to a standard that supports regulatory submissions and client audits.
- Track and version all workflow configurations; ensure all analyses are reproducible and traceable to sample, pipeline version, and reference build.
Literature Engagement & Scientific Communication
- Actively monitor and critically evaluate the latest somatic variant calling, CNV analysis, tumour purity, and liquid biopsy literature; synthesise findings into concise internal reports and presentations.
- Present benchmarking results, algorithmic updates, and QC summaries to internal scientific leads, clinical partners, and pharma clients in a clear, data-driven manner.
- Translate published methods into practical implementations, including a clear assessment of reproducibility, dataset applicability, and performance trade-offs.
Collaboration & Mentorship
- Work closely with software engineers, clinical scientists, and project managers to align bioinformatics deliverables with project milestones and client requirements.
- Provide technical reviews of code, pipeline configurations, and analysis outputs from junior team members.
- Contribute to internal knowledge-sharing through code documentation, runbooks, and structured onboarding materials.
Required Skills & Qualifications
Core Technical
- Language: Python
- Primary development language for pipeline logic, variant annotation, statistical testing, and tool customisation
- Expected proficiency: object-oriented code, unit testing, environment management (conda/pip), subprocess/CLI integration
- Somatic Variant Calling
- Hands-on experience with Mutect2, Strelka2, or equivalent for SNV/indel detection in tumour-only and tumour-normal modes
- Understanding of FILTER field logic, panel-of-normals (PoN) construction, and artefact classes (OxoG, FFPE damage, alignment artefacts)
- Somatic CNV Analysis
- Practical experience with CNVkit and/or PureCN for targeted panel and WES data
- Familiarity with segmentation algorithms (CBS, HMM), log2 ratio normalisation, and allele-specific copy-number estimation
- Tumour Purity & Ploidy Estimation
- Ability to interpret and validate purity estimates from VAF distributions, coverage ratios, and B-allele frequency plots
- Awareness of purity confounders: clonality, subclonal CNVs, normal contamination
- Environment: Linux / Shell Scripting
- Comfortable in HPC/cloud environments; proficiency in bash scripting, job scheduling, and debugging pipeline failures
- Version Control & Reproducibility
- Git-based development workflow; familiarity with container-based reproducibility (Docker/Singularity)
Strongly Preferred
- R: Statistical analysis, ggplot2 visualisation, and custom QC report generation; experience with data.table or tidyverse for large genomic data frames
- Nextflow DSL2: Pipeline authoring, module development, and execution on AWS (HealthOmics, Batch) or SLURM/HPC
- Germline Variant Calling: Understanding of germline SNV/indel pipelines (GATK HaplotypeCaller, DeepVariant) to contextualise somatic filtering strategies and germline contamination detection
- Benchmarking Methodology: Experience designing truth-set-based evaluations; familiarity with SEQC2, GIAB, or synthetic mixture datasets; knowledge of ROC/precision-recall frameworks for variant callers
- AWS Cloud: S3 data management, Batch job orchestration, IAM and cost-control practices in a bioinformatics context
Soft Skills & Work Style
- Documentation discipline: Ability to write clear, complete analysis logs, parameter rationale docs, and pipeline runbooks with no prompting required
- Scientific communication: Comfortable presenting complex results to technical and non-technical audiences alike, with clean, well-labelled figures
- Literature fluency: Updated on primary literature, identifying methodologically sound papers, and flagging relevant advances proactively
- Intellectual rigour: Questions assumptions, checks edge cases, raises concerns early; prefers correctness over speed
- Project alignment: Deeply understands the goals of the project before coding begins; asks clarifying questions rather than making undocumented assumptions