
Search by job, company or skills
Role: Speech ML Engineer
Function: Machine Learning / Speech Engineering
Location: Mumbai
Type: Full-time
Industry: Consumer AI
About Company
The company is building the AI layer for Bharat at India-scale. It is backed by partnerships with global tech leaders like Meta and Google.
The platform is engineered from day one for 100M+ users and 1B-ready constraints on latency, cost, reliability, and safety. The team combines deep India-first AI capability with unmatched India-scale distribution.
The culture emphasises engineering excellence, strong collaboration, and tangible impact across sectors that matter to India.
Position Overview
We are looking for a Speech ML Engineer to own state-of-the-art STT and TTS modeling for a multilingual, multimodal consumer AI app serving 100M+ Indian users. You will design and ship production-grade streaming speech architectures—from acoustic modeling to vocoder pipelines—optimized for India's linguistic diversity and real-world latency constraints. This is a high-ownership IC role where your work ships to scale, not a research sandbox.
Role & Responsibilities
• Design, train, and ship production STT models (Whisper, Conformer-based) for Indian languages with low-latency streaming inference
• Build and optimize TTS pipelines including acoustic models and neural vocoders (HiFi-GAN, VITS) for natural, low-latency speech synthesis
• Architect real-time streaming speech pipelines with end-to-end latency targets under 300ms for live assistant interactions
• Own model evaluation harnesses covering WER, MOS, latency, and robustness across noisy Indian acoustic environments
• Fine-tune and adapt multilingual speech models on proprietary Indian language datasets across 10+ languages
• Collaborate with the Core Intelligence platform team to integrate speech I/O into the agent orchestration and memory layers
• Drive model compression, quantization, and hardware-aware optimization for on-device and edge deployment scenarios
Must Have Criteria
• 1–7 years of hands-on ML engineering experience with a focus on speech/audio models
• Shipped at least one production STT or TTS model serving real users at scale (not research or demo-only)
• Hands-on experience training or fine-tuning Whisper or Conformer-based ASR architectures
• Hands-on experience with neural vocoders (HiFi-GAN, VITS, or equivalent) for production TTS systems
• Experience building streaming speech inference pipelines with real-time latency constraints
• Proficiency in Python and PyTorch for model training, fine-tuning, and serving
• Experience with speech data pipelines—preprocessing, augmentation, and evaluation at scale
Nice to Have
• Experience with Indian language ASR/TTS across Hindi, Tamil, Telugu, Bengali, or other Indic languages
• Prior work at a consumer AI, voice assistant, or speech-tech product company (e.g., Sarvam, Krutrim, Deepgram, ElevenLabs)
• Experience with on-device or edge model deployment using ONNX, TensorRT, or CoreML
• Familiarity with speaker diarization, voice activity detection, or speaker adaptation techniques
• Open-source contributions to speech ML projects or published work in ASR/TTS
What We Offer
• Opportunity to build speech infrastructure for 100M+ Indian users across 10+ languages
• High-autonomy IC role with direct ownership of production systems—not a support function
• Work alongside a world-class AI team backed by Meta and Google partnerships
• Fast iteration loops: prototype to scaled rollout in weeks, not quarters
• Competitive compensation with meaningful equity in a category-defining consumer AI platform
Job ID: 149071079
We don’t charge any money for job offers