Search by job, company or skills

recrew ai

Speech ML Engineer

Save
  • Posted 7 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role: Speech ML Engineer

Function: Machine Learning / Speech Engineering

Location: Mumbai

Type: Full-time

Industry: Consumer AI

About Company

The company is building the AI layer for Bharat at India-scale. It is backed by partnerships with global tech leaders like Meta and Google.

The platform is engineered from day one for 100M+ users and 1B-ready constraints on latency, cost, reliability, and safety. The team combines deep India-first AI capability with unmatched India-scale distribution.

The culture emphasises engineering excellence, strong collaboration, and tangible impact across sectors that matter to India.

Position Overview

We are looking for a Speech ML Engineer to own state-of-the-art STT and TTS modeling for a multilingual, multimodal consumer AI app serving 100M+ Indian users. You will design and ship production-grade streaming speech architectures—from acoustic modeling to vocoder pipelines—optimized for India's linguistic diversity and real-world latency constraints. This is a high-ownership IC role where your work ships to scale, not a research sandbox.

Role & Responsibilities

• Design, train, and ship production STT models (Whisper, Conformer-based) for Indian languages with low-latency streaming inference

• Build and optimize TTS pipelines including acoustic models and neural vocoders (HiFi-GAN, VITS) for natural, low-latency speech synthesis

• Architect real-time streaming speech pipelines with end-to-end latency targets under 300ms for live assistant interactions

• Own model evaluation harnesses covering WER, MOS, latency, and robustness across noisy Indian acoustic environments

• Fine-tune and adapt multilingual speech models on proprietary Indian language datasets across 10+ languages

• Collaborate with the Core Intelligence platform team to integrate speech I/O into the agent orchestration and memory layers

• Drive model compression, quantization, and hardware-aware optimization for on-device and edge deployment scenarios

Must Have Criteria

• 1–7 years of hands-on ML engineering experience with a focus on speech/audio models

• Shipped at least one production STT or TTS model serving real users at scale (not research or demo-only)

• Hands-on experience training or fine-tuning Whisper or Conformer-based ASR architectures

• Hands-on experience with neural vocoders (HiFi-GAN, VITS, or equivalent) for production TTS systems

• Experience building streaming speech inference pipelines with real-time latency constraints

• Proficiency in Python and PyTorch for model training, fine-tuning, and serving

• Experience with speech data pipelines—preprocessing, augmentation, and evaluation at scale

Nice to Have

• Experience with Indian language ASR/TTS across Hindi, Tamil, Telugu, Bengali, or other Indic languages

• Prior work at a consumer AI, voice assistant, or speech-tech product company (e.g., Sarvam, Krutrim, Deepgram, ElevenLabs)

• Experience with on-device or edge model deployment using ONNX, TensorRT, or CoreML

• Familiarity with speaker diarization, voice activity detection, or speaker adaptation techniques

• Open-source contributions to speech ML projects or published work in ASR/TTS

What We Offer

• Opportunity to build speech infrastructure for 100M+ Indian users across 10+ languages

• High-autonomy IC role with direct ownership of production systems—not a support function

• Work alongside a world-class AI team backed by Meta and Google partnerships

• Fast iteration loops: prototype to scaled rollout in weeks, not quarters

• Competitive compensation with meaningful equity in a category-defining consumer AI platform

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 149071079