Job Description
Senior Data Scientist – 4+ Years Experience
Role Overview
We are looking for a Senior Data Scientist with 4+ years of experience and strong hands-on expertise working with Transformer-based models beyond API usage. This role sits between research and engineering, focusing on understanding, training, modifying, and improving models rather than simply integrating them.
The ideal candidate is comfortable working deep inside model architectures, training pipelines, fine-tuning methods, and inference optimization, with a strong first-principles mindset.
Eligibility Requirement (Read Carefully)
Applicants must already have prior hands-on experience training or modifying Transformer-based models or related systems, either open-source or in-house.
Candidates whose experience is limited to using hosted APIs or prompting models without working at the training or architecture level should not apply.
Must Have
Advanced Understanding of Transformer Architectures
Deep Theoretical And Implementation-level Understanding Of Transformers, Including
Encoder–Decoder and Decoder-only architectures
Attention mechanisms and positional encodings
Training dynamics and scaling behavior
Strong understanding of common limitations such as context length constraints, efficiency bottlenecks, and hallucinations, along with approaches to mitigate them.
Intermediate-Level PEFT Expertise
Practical Experience With Parameter-efficient Fine-tuning Techniques, Including
LoRA
QLoRA
Adapter-based methods
Clear understanding of trade-offs between PEFT approaches and full fine-tuning.
Model Training and Modification (Mandatory)
Hands-on Experience With
Training or fine-tuning models from checkpoints or from scratch
Implementing and customizing training loops
Designing or modifying loss functions and optimization strategies
Fine-tuning without reliance on hosted APIs
Core Engineering And Research Skills
Strong experience with PyTorch (preferred)
GPU training workflows and performance debugging
Ability to read and implement research papers
Experience diagnosing training instability and model failures
Designing experiments and evaluating model behavior
Key Responsibilities
Analyze and improve Transformer architectures and training strategies
Train and fine-tune models using custom pipelines
Implement optimization techniques such as mixed precision, quantization, and pruning
Improve inference efficiency across latency, memory, and throughput
Run hypothesis-driven experiments and document findings
Good to Have
Experience With Multimodal Or Generative Models, Including
Diffusion models
Vision or audio transformers
Image, video, or audio generation systems
Additional Strengths Include
Experience modifying model architectures, attention mechanisms, or training objectives
Familiarity with efficient attention implementations
Contributions to open-source machine learning or independent research projects
Ideal Candidate
Thinks like a researcher and builds like an engineer
Curious about why models fail, not just how to use them
Comfortable experimenting, iterating, and improving systems
Prefers deep understanding over black-box usage.