Senior Data Scientist (Transformers/Deep Learning)

Akaike Technologies

Bengaluru, India

4-6 Years

Save

Posted an hour ago
Be among the first 10 applicants

Early Applicant

Job Description

Senior Data Scientist – 4+ Years Experience

Role Overview

We are looking for a Senior Data Scientist with 4+ years of experience and strong hands-on expertise working with Transformer-based models beyond API usage. This role sits between research and engineering, focusing on understanding, training, modifying, and improving models rather than simply integrating them.

The ideal candidate is comfortable working deep inside model architectures, training pipelines, fine-tuning methods, and inference optimization, with a strong first-principles mindset.

Eligibility Requirement (Read Carefully)

Applicants must already have prior hands-on experience training or modifying Transformer-based models or related systems, either open-source or in-house.

Candidates whose experience is limited to using hosted APIs or prompting models without working at the training or architecture level should not apply.

Must Have

Advanced Understanding of Transformer Architectures

Deep Theoretical And Implementation-level Understanding Of Transformers, Including

Encoder–Decoder and Decoder-only architectures

Attention mechanisms and positional encodings

Training dynamics and scaling behavior

Strong understanding of common limitations such as context length constraints, efficiency bottlenecks, and hallucinations, along with approaches to mitigate them.

Intermediate-Level PEFT Expertise

Practical Experience With Parameter-efficient Fine-tuning Techniques, Including

LoRA

QLoRA

Adapter-based methods

Clear understanding of trade-offs between PEFT approaches and full fine-tuning.

Model Training and Modification (Mandatory)

Hands-on Experience With

Training or fine-tuning models from checkpoints or from scratch

Implementing and customizing training loops

Designing or modifying loss functions and optimization strategies

Fine-tuning without reliance on hosted APIs

Core Engineering And Research Skills

Strong experience with PyTorch (preferred)

GPU training workflows and performance debugging

Ability to read and implement research papers

Experience diagnosing training instability and model failures

Designing experiments and evaluating model behavior

Key Responsibilities

Analyze and improve Transformer architectures and training strategies

Train and fine-tune models using custom pipelines

Implement optimization techniques such as mixed precision, quantization, and pruning

Improve inference efficiency across latency, memory, and throughput

Run hypothesis-driven experiments and document findings

Good to Have

Experience With Multimodal Or Generative Models, Including

Diffusion models

Vision or audio transformers

Image, video, or audio generation systems

Additional Strengths Include

Experience modifying model architectures, attention mechanisms, or training objectives

Familiarity with efficient attention implementations

Contributions to open-source machine learning or independent research projects

Ideal Candidate

Thinks like a researcher and builds like an engineer

Curious about why models fail, not just how to use them

Comfortable experimenting, iterating, and improving systems

Prefers deep understanding over black-box usage.