Bioinformatics Engineer

Strand Life Sciences

India

Fresher

Save

Posted 19 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About Strand Life Sciences

Strand is a 23-year-old spin-off from the Indian Institute of Science and one of India's foremost genomics companies, serving a global customer base across clinical diagnostics, pharma, and translational research. We build algorithms, data pipelines, and visualisations that help clinicians and researchers unlock biological insight from large-scale sequencing data. As next-generation sequencing becomes routine in clinical practice, the work done at Strand sits at the frontier of that transformation.

About the Role

We are looking for a rigorous, hands-on Bioinformatics Engineer to join a high-performing project team focused on somatic variant and copy-number analysis for oncology applications. The person stepping into this role is expected to match a high standard technically and in terms of documentation quality, cross-functional communication, and scientific rigour from day one. The core of the work is building, customising, and benchmarking pipelines for somatic SNV/indel and CNV detection from paired tumour-normal and tumour-only short-read sequencing data, with a strong emphasis on tumour purity estimation, focal CNV calling, QC framework design, and algorithm development for targeted consumer and clinical genomics workflows.

Key Responsibilities

Pipeline Development & Variant Calling

Design, implement, and maintain production-grade somatic SNV/indel pipelines using tools such as Mutect2, Strelka2, VarScan2, and custom ensemble callers for both paired tumour-normal and tumour-only modes.
Build and customize allele specific CNV calling workflows integrating (but not limited to) CNVkit, PureCN, GATK CNV, FREEC, AscatNGS, calibrating for panel, WES, and low-pass WGS modalities.
Develop and refine tumour purity and ploidy estimation approaches; understand and implement WGS/WES-based purity inference using allele frequency distributions, copy-number fits, and cfDNA-specific adjustments where relevant.
Extend and customize existing open-source tools (CNVkit, PureCN, Mutect2 filters) to meet project-specific QC thresholds, coverage requirements, and allele frequency sensitivity targets.
Implement Nextflow DSL2 workflows for end-to-end pipeline orchestration in on-premise HPC environments.

Algorithm Development & Statistical Hypothesis Testing

Formulate and test statistical hypotheses for new variant-calling and CNV segmentation algorithms targeted at consumer and clinical genomics applications, from power calculations and simulation to empirical validation on real datasets.
Design and execute benchmarking studies evaluating tool sensitivity, specificity, LOD, and FDR across FFPE, fresh-frozen, and liquid biopsy samples using truth sets (SEQC2, Genome in a Bottle, synthetic mixtures).
Develop statistical frameworks for QC metric thresholds like coverage uniformity, tumour fraction sufficiency, strand bias, base quality, and mapping artefacts, with pass/fail decision logic.
Prototype and evaluate novel approaches for low-allele-frequency variant rescue, sub-clonal CNV detection, and artefact filtering in tumour-only settings.

Quality Control & Scientific Rigour

Own QC frameworks at every pipeline stage: pre-alignment (FastQC, fastp), alignment (duplicate rates, on-target %, coverage uniformity), variant-level (VAF distributions, Ti/Tv ratios, strand bias), and report-level (concordance, reproducibility).
Maintain comprehensive documentation for all pipelines, parameter choices, validation results, and known limitations to a standard that supports regulatory submissions and client audits.
Track and version all workflow configurations; ensure all analyses are reproducible and traceable to sample, pipeline version, and reference build.

Literature Engagement & Scientific Communication

Actively monitor and critically evaluate the latest somatic variant calling, CNV analysis, tumour purity, and liquid biopsy literature; synthesise findings into concise internal reports and presentations.
Present benchmarking results, algorithmic updates, and QC summaries to internal scientific leads, clinical partners, and pharma clients in a clear, data-driven manner.
Translate published methods into practical implementations, including a clear assessment of reproducibility, dataset applicability, and performance trade-offs.

Collaboration & Mentorship

Work closely with software engineers, clinical scientists, and project managers to align bioinformatics deliverables with project milestones and client requirements.
Provide technical reviews of code, pipeline configurations, and analysis outputs from junior team members.
Contribute to internal knowledge-sharing through code documentation, runbooks, and structured onboarding materials.

Required Skills & Qualifications

Core Technical

Language: Python
Primary development language for pipeline logic, variant annotation, statistical testing, and tool customisation
Expected proficiency: object-oriented code, unit testing, environment management (conda/pip), subprocess/CLI integration
Somatic Variant Calling
Hands-on experience with Mutect2, Strelka2, or equivalent for SNV/indel detection in tumour-only and tumour-normal modes
Understanding of FILTER field logic, panel-of-normals (PoN) construction, and artefact classes (OxoG, FFPE damage, alignment artefacts)
Somatic CNV Analysis
Practical experience with CNVkit and/or PureCN for targeted panel and WES data
Familiarity with segmentation algorithms (CBS, HMM), log2 ratio normalisation, and allele-specific copy-number estimation
Tumour Purity & Ploidy Estimation
Ability to interpret and validate purity estimates from VAF distributions, coverage ratios, and B-allele frequency plots
Awareness of purity confounders: clonality, subclonal CNVs, normal contamination
Environment: Linux / Shell Scripting
Comfortable in HPC/cloud environments; proficiency in bash scripting, job scheduling, and debugging pipeline failures
Version Control & Reproducibility
Git-based development workflow; familiarity with container-based reproducibility (Docker/Singularity)

Strongly Preferred

R: Statistical analysis, ggplot2 visualisation, and custom QC report generation; experience with data.table or tidyverse for large genomic data frames
Nextflow DSL2: Pipeline authoring, module development, and execution on AWS (HealthOmics, Batch) or SLURM/HPC
Germline Variant Calling: Understanding of germline SNV/indel pipelines (GATK HaplotypeCaller, DeepVariant) to contextualise somatic filtering strategies and germline contamination detection
Benchmarking Methodology: Experience designing truth-set-based evaluations; familiarity with SEQC2, GIAB, or synthetic mixture datasets; knowledge of ROC/precision-recall frameworks for variant callers
AWS Cloud: S3 data management, Batch job orchestration, IAM and cost-control practices in a bioinformatics context

Soft Skills & Work Style

Documentation discipline: Ability to write clear, complete analysis logs, parameter rationale docs, and pipeline runbooks with no prompting required
Scientific communication: Comfortable presenting complex results to technical and non-technical audiences alike, with clean, well-labelled figures
Literature fluency: Updated on primary literature, identifying methodologically sound papers, and flagging relevant advances proactively
Intellectual rigour: Questions assumptions, checks edge cases, raises concerns early; prefers correctness over speed
Project alignment: Deeply understands the goals of the project before coding begins; asks clarifying questions rather than making undocumented assumptions