Search by job, company or skills

gnani.ai

LLM R&D Specialist

new job description bg glownew job description bg glownew job description bg svg
  • Posted 14 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

IndiaAI is building India's next-gen foundational LLMs. We're looking for a hands-on Senior ML Engineer experienced in large-scale pre-training, distributed GPU systems, and data creation pipelines. You will work with Megatron-LM, NVIDIA NeMo, DeepSpeed, PyTorch Distributed, and SLURM to train 7B70B+ models on multi-node GPU clusters.

What You'll Do

  • Build & optimize LLM pre-training pipelines (7B70B+).
  • Implement distributed training using PyTorch Distributed, DeepSpeed (ZeRO/FSDP), Megatron-LM, NVIDIA NeMo.
  • Manage multi-node GPU jobs via SLURM and optimize NCCL communication.
  • Lead large-scale data creation, cleaning, deduplication, tokenization & sharding for multilingual datasets (with focus on Indian languages).
  • Build high-throughput dataloaders, monitoring dashboards & training workflows.
  • Collaborate with infra teams to optimize GPU utilization, networking, and storage systems.

What You Bring

  • 5+ years in ML Engineering / DL Systems.
  • Prior experience training large transformer models (ideal: 7B+).
  • Strong in NeMo, Megatron-LM, DeepSpeed, PyTorch Distributed.
  • Experience with SLURM & multi-node GPU clusters (A100/H100).
  • Understanding of transformer internals (attention, RoPE, FlashAttention, parallelism).
  • Experience in data pipelines cleaning, dataset assembly, tokenization.

Bonus Skills

  • Indic-language data experience
  • MoE training
  • Kernel-level optimization (Triton/CUDA)
  • Open-source contributions (Megatron, NeMo, DeepSpeed, PyTorch)

Apply now to help build India's national-scale foundational AI models.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 139483601