Search by job, company or skills

Strand Life Sciences

Bioinformatics Engineer

Fresher
Save
  • Posted 19 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About Strand Life Sciences

Strand is a 23-year-old spin-off from the Indian Institute of Science and one of India's foremost genomics companies, serving a global customer base across clinical diagnostics, pharma, and translational research. We build algorithms, data pipelines, and visualisations that help clinicians and researchers unlock biological insight from large-scale sequencing data. As next-generation sequencing becomes routine in clinical practice, the work done at Strand sits at the frontier of that transformation.

About the Role

We are looking for a rigorous, hands-on Bioinformatics Engineer to join a high-performing project team focused on somatic variant and copy-number analysis for oncology applications. The person stepping into this role is expected to match a high standard technically and in terms of documentation quality, cross-functional communication, and scientific rigour from day one. The core of the work is building, customising, and benchmarking pipelines for somatic SNV/indel and CNV detection from paired tumour-normal and tumour-only short-read sequencing data, with a strong emphasis on tumour purity estimation, focal CNV calling, QC framework design, and algorithm development for targeted consumer and clinical genomics workflows.

Key Responsibilities

Pipeline Development & Variant Calling

  • Design, implement, and maintain production-grade somatic SNV/indel pipelines using tools such as Mutect2, Strelka2, VarScan2, and custom ensemble callers for both paired tumour-normal and tumour-only modes.
  • Build and customize allele specific CNV calling workflows integrating (but not limited to) CNVkit, PureCN, GATK CNV, FREEC, AscatNGS, calibrating for panel, WES, and low-pass WGS modalities.
  • Develop and refine tumour purity and ploidy estimation approaches; understand and implement WGS/WES-based purity inference using allele frequency distributions, copy-number fits, and cfDNA-specific adjustments where relevant.
  • Extend and customize existing open-source tools (CNVkit, PureCN, Mutect2 filters) to meet project-specific QC thresholds, coverage requirements, and allele frequency sensitivity targets.
  • Implement Nextflow DSL2 workflows for end-to-end pipeline orchestration in on-premise HPC environments.

Algorithm Development & Statistical Hypothesis Testing

  • Formulate and test statistical hypotheses for new variant-calling and CNV segmentation algorithms targeted at consumer and clinical genomics applications, from power calculations and simulation to empirical validation on real datasets.
  • Design and execute benchmarking studies evaluating tool sensitivity, specificity, LOD, and FDR across FFPE, fresh-frozen, and liquid biopsy samples using truth sets (SEQC2, Genome in a Bottle, synthetic mixtures).
  • Develop statistical frameworks for QC metric thresholds like coverage uniformity, tumour fraction sufficiency, strand bias, base quality, and mapping artefacts, with pass/fail decision logic.
  • Prototype and evaluate novel approaches for low-allele-frequency variant rescue, sub-clonal CNV detection, and artefact filtering in tumour-only settings.

Quality Control & Scientific Rigour

  • Own QC frameworks at every pipeline stage: pre-alignment (FastQC, fastp), alignment (duplicate rates, on-target %, coverage uniformity), variant-level (VAF distributions, Ti/Tv ratios, strand bias), and report-level (concordance, reproducibility).
  • Maintain comprehensive documentation for all pipelines, parameter choices, validation results, and known limitations to a standard that supports regulatory submissions and client audits.
  • Track and version all workflow configurations; ensure all analyses are reproducible and traceable to sample, pipeline version, and reference build.

Literature Engagement & Scientific Communication

  • Actively monitor and critically evaluate the latest somatic variant calling, CNV analysis, tumour purity, and liquid biopsy literature; synthesise findings into concise internal reports and presentations.
  • Present benchmarking results, algorithmic updates, and QC summaries to internal scientific leads, clinical partners, and pharma clients in a clear, data-driven manner.
  • Translate published methods into practical implementations, including a clear assessment of reproducibility, dataset applicability, and performance trade-offs.

Collaboration & Mentorship

  • Work closely with software engineers, clinical scientists, and project managers to align bioinformatics deliverables with project milestones and client requirements.
  • Provide technical reviews of code, pipeline configurations, and analysis outputs from junior team members.
  • Contribute to internal knowledge-sharing through code documentation, runbooks, and structured onboarding materials.

Required Skills & Qualifications

Core Technical

  • Language: Python
  • Primary development language for pipeline logic, variant annotation, statistical testing, and tool customisation
  • Expected proficiency: object-oriented code, unit testing, environment management (conda/pip), subprocess/CLI integration
  • Somatic Variant Calling
  • Hands-on experience with Mutect2, Strelka2, or equivalent for SNV/indel detection in tumour-only and tumour-normal modes
  • Understanding of FILTER field logic, panel-of-normals (PoN) construction, and artefact classes (OxoG, FFPE damage, alignment artefacts)
  • Somatic CNV Analysis
  • Practical experience with CNVkit and/or PureCN for targeted panel and WES data
  • Familiarity with segmentation algorithms (CBS, HMM), log2 ratio normalisation, and allele-specific copy-number estimation
  • Tumour Purity & Ploidy Estimation
  • Ability to interpret and validate purity estimates from VAF distributions, coverage ratios, and B-allele frequency plots
  • Awareness of purity confounders: clonality, subclonal CNVs, normal contamination
  • Environment: Linux / Shell Scripting
  • Comfortable in HPC/cloud environments; proficiency in bash scripting, job scheduling, and debugging pipeline failures
  • Version Control & Reproducibility
  • Git-based development workflow; familiarity with container-based reproducibility (Docker/Singularity)

Strongly Preferred

  • R: Statistical analysis, ggplot2 visualisation, and custom QC report generation; experience with data.table or tidyverse for large genomic data frames
  • Nextflow DSL2: Pipeline authoring, module development, and execution on AWS (HealthOmics, Batch) or SLURM/HPC
  • Germline Variant Calling: Understanding of germline SNV/indel pipelines (GATK HaplotypeCaller, DeepVariant) to contextualise somatic filtering strategies and germline contamination detection
  • Benchmarking Methodology: Experience designing truth-set-based evaluations; familiarity with SEQC2, GIAB, or synthetic mixture datasets; knowledge of ROC/precision-recall frameworks for variant callers
  • AWS Cloud: S3 data management, Batch job orchestration, IAM and cost-control practices in a bioinformatics context

Soft Skills & Work Style

  • Documentation discipline: Ability to write clear, complete analysis logs, parameter rationale docs, and pipeline runbooks with no prompting required
  • Scientific communication: Comfortable presenting complex results to technical and non-technical audiences alike, with clean, well-labelled figures
  • Literature fluency: Updated on primary literature, identifying methodologically sound papers, and flagging relevant advances proactively
  • Intellectual rigour: Questions assumptions, checks edge cases, raises concerns early; prefers correctness over speed
  • Project alignment: Deeply understands the goals of the project before coding begins; asks clarifying questions rather than making undocumented assumptions

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148923161

Similar Jobs

Pune, India

Skills:

GithubBitbucketPostgreSQLmySQLMongoDBOraclePythonAWS