
Search by job, company or skills
About us:
Granules India Ltd. is a fully integrated pharmaceutical manufacturer. The Company manufactures Active Pharmaceutical Ingredients (APIs), Pharmaceutical Formulation Intermediates (PFIs) and Finished Dosages (FDs) which are distributed in over 80 countries. We're building Nyra (Internal Product), an emotionally intelligent conversational AI with real-time voice and avatar that feel indistinguishably human. We're looking for a hands-on AI Engineer / Researcher to build speech-first multimodal models that generate natural, emotional, low- latency audio - and align it with visual avatars in real time. This role sits at the core of our product.
Role Summary:
We're hiring a Lead ML Researcher to own and drive foundational work in speech-first multimodal conversational AI. You'll lead the design of models that learn from user-avatar interactions and predict, control, and generate real-time speech, visual, and language responses with human-level realism. Your work will define how Nyra sounds, behaves, and emotionally connects with users. This isn't a role for someone who follows a roadmap. You'll set the technical direction, make key architectural decisions, and shape what the next generation of emotionally intelligent AI human avatars should feel like.
Your Mission:
Build and fine-tune human-level TTS systems with emotion, prosody, and non-verbal expressions (laughter, emphasis, shouting,pauses).
Design real-time, streaming speech pipelines optimized for low latency and natural turn-taking.
Develop multimodal generation models that align speech, language, and avatar behavior.
Work with diffusion-based audio/video models and long-form generation.
Collaborate with Applied ML to ship research into production.
You'll Be Great At This If You Have:
A PhD (or near completion) or Masters in a relevant field, or equivalent hands-on research experience.
Strong experience with speech generation models (TTS / neural audio).
Hands-on experience fine-tuning TTS models to produce human-like emotion, prosody, and expressive non-verbal cues.
Deep understanding of why synthetic speech sounds artificial - and how to fix it.
Experience in training, fine-tuning, and deploying AI models with real time streaming enabled (not just demos).
Solid foundations in generative modeling.
Proficiency in PyTorch and GPU-based inference.
Experience with image/video generation models or strong interest in learning fast.
Nice-to-Haves:
Real-time or streaming audio systems or conversation agents/avatars.
Experience building or fine-tuning Indic language speech models that sound natural and expressive.
Diffusion models (audio or video).
Talking-heads, neural avatars, or audio-driven animation.
Familiarity with software engineering best practices.
Publications in top-tier or respected venues (CVPR, NeurIPS, ICASSP, Interspeech, etc.).
Goal:
Users should not feel like they're talking to an AI - just someone real.
Location & Engagement:
This is a full-time, on-site role based in Hyderabad. Candidates must be willing to work closely with key stakeholders, along with AI engineers, ML specialists, and creative teams to bring the vision to life.
Benefits & Culture:
When you join Granules India Ltd, you're joining a diverse and supportive team. Our work is driven by our people, and our success is shared by all. This position has a flexible work schedule, competitive healthcare, and gear stipends, as well as plenty of fun. At the end of the day, we want Granules to be a place for you to learn, directly drive impact, and work with a team you love.
To learn more about our team culture and benefits, check out our hiring page.
Granules is growing fast, and we'd like you to grow with us. If you're excited to get your hands dirty and help make machines more human, drop your resume and we'll be in touch.
We are not looking for cultural fits, we are looking for culture creators. Diversity is what drives our success it's at the core of how we hire, communicate, and work. We are inclusive to all and combine our diverse backgrounds, skill sets, and perspectives to build the best experiences for our clients.
Job ID: 139882965