Search by job, company or skills

HCLTech

Speech Language Specialist

new job description bg glownew job description bg glownew job description bg svg
  • Posted 6 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Title: Speech AI Engineer

Location : Noida

Yrs of exp : 6+ years

Role summary

We are looking for a senior hands-on expert who can take speech systems from raw audio to reliable production features. You will build and improve core speech capabilities such as ASR, TTS, voice conversion, and speech-to-speech workflows, and you will also own the engineering work that makes them fast, scalable, and measurable in the real world.

This role is a strong fit if you enjoy the full stack of speech AI: signal processing intuition, modern deep learning, decoding and streaming constraints, and practical deployment trade-offs.

What you will own

1) Speech modeling that ships

  • Build, train, and iterate on ASR models for real-world conditions such as conversational speech, accents, noise, and far-field audio, with strong offline and online evaluation discipline.
  • Develop and improve TTS systems that are natural, low-latency, and stable on speaker identity and prosody, with production-quality inference constraints.
  • Work on voice conversion and accent conversion when needed, preserving intelligibility, naturalness, and speaker identity in streaming settings.

2) Decoder and streaming engineering

  • Design and implement decoding stacks using proven libraries and patterns, including Kaldi and OpenFST, and features like custom vocabulary injection, language model rescoring, and beam search tuning.
  • Build streaming inference systems with strict latency budgets and predictable behavior at scale, including monitoring and continuous improvement loops.

3) Speech analysis and speech intelligence

  • Deliver speech analytics building blocks such as VAD, diarization, speaker recognition, and quality analytics that improve end-to-end product outcomes.
  • Design robust evaluation harnesses and datasets for real user scenarios, including domain adaptation and behavior tuning across use cases.

4) GenAI and LLM integration for voice experiences

  • Integrate speech components into LLM-based systems, including cascaded ASR plus LLM plus TTS pipelines, and drive joint optimization where it materially improves product quality.
  • Build or extend speech generation capabilities including voice cloning, controllable prosody, and modern generative architectures where relevant to the roadmap.

5) Production deployment and operational excellence

  • Own end-to-end delivery: prototyping, ablations, training, evaluation, optimization, deployment, and post-launch monitoring.
  • Partner closely with product and platform teams to integrate models into real-time systems and maintain reliability, uptime, and quality under production traffic.

Required qualifications

  • 6+ years building production-grade speech or audio ML systems, or equivalent depth through research plus shipped production impact.
  • Strong programming ability in Python, plus comfort in C or C++ for performance-critical components.
  • Proven expertise in deep learning for speech (PyTorch or TensorFlow) and practical model training and serving.
  • Solid fundamentals in speech and audio, including signal processing concepts and real-world acoustic variability.
  • Experience deploying models into real-time or high-throughput systems, including evaluation, scalability, and production reliability.

Strongly preferred

  • Hands-on experience with decoding toolchains and speech customization, including WFST concepts, beam search, and LM rescoring.
  • Experience with conversational or telephony speech systems, where latency, robustness, and product polish matter more than leaderboard wins.
  • Experience with generative speech systems such as voice cloning, flow matching, diffusion or autoregressive Transformers, and model optimization for real-time inference.
  • Familiarity with modern speech stacks and frameworks such as NVIDIA NeMo (or comparable) for ASR and TTS workflows.
  • Publications or strong open-source contributions in speech and audio AI.

Freshers kindly do not apply for the job

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 144628335

Similar Jobs