Search by job, company or skills

techkareer

Voice AI Engineer

Save
new job description bg glownew job description bg glow
  • Posted 20 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We're helping an AI startup based out of Delhi founded by IIT and IIM Alumni hire a Voice AI Engineer.

About the Role

We are looking for a Voice AI Engineer to build and improve real-time voice agents that can handle natural, reliable, human-like conversations over phone calls. You will work across speech, LLMs, telephony, backend systems, and agent orchestration to build production-grade AI calling experiences.

This role is ideal for someone who has hands-on experience building voice agents, understands the latency and reliability challenges of real-time audio, and can own systems end-to-end from prototype to production.

Responsibilities

  • Build and maintain real-time Voice AI agents for inbound and outbound calls.
  • Work with speech-to-text, text-to-speech, LLMs, and voice orchestration pipelines.
  • Integrate with telephony platforms such as Twilio, Plivo, Exotel, Vonage, or similar.
  • Design low-latency conversation flows with interruption handling, turn-taking, retries, fallbacks, and human handoff.
  • Improve agent quality across:
  • Latency
  • Accuracy
  • Naturalness
  • Conversation completion rate
  • User satisfaction
  • Build backend services for call handling, logging, analytics, and agent state management.
  • Integrate voice agents with CRMs, EMRs, scheduling tools, internal dashboards, databases, and APIs.
  • Create evaluation frameworks for voice agent performance, including call success rate, hallucination rate, escalation rate, and transcription quality.
  • Debug production issues related to audio streams, WebSockets, telephony failures, latency spikes, and LLM behavior.
  • Collaborate with product, design, and operations teams to ship reliable voice experiences for real users.

Requirements

  • 1+ years of hands-on engineering experience.
  • Experience building Voice AI agents, conversational AI systems, or real-time audio applications.
  • Strong backend engineering skills in Python, Node.js, TypeScript, or Go.
  • Experience working with LLM APIs such as OpenAI, Anthropic, Gemini, or open-source models.
  • Familiarity with speech-to-text and text-to-speech systems such as Deepgram, AssemblyAI, Whisper, ElevenLabs, Cartesia, PlayHT, Azure Speech, Google Speech, or similar.
  • Experience with WebSockets, streaming APIs, async processing, and event-driven systems.
  • Understanding of real-time voice challenges such as:
  • Latency optimization
  • Barge-in / interruption handling
  • Silence detection
  • Voice activity detection
  • Turn detection
  • Audio quality issues
  • Call drops and retries
  • Ability to write clean, reliable, production-ready code.
  • Strong debugging skills and comfort working with ambiguous problems.

Good to Have

  • Experience with Twilio Media Streams, SIP, IVR systems, call routing, or PSTN infrastructure.
  • Experience building voice agents using frameworks such as LiveKit Agents, Vapi, Retell, Bland, Pipecat, Daily, or similar.
  • Experience with RAG, tool calling, agentic workflows, and structured outputs.
  • Experience with healthcare, fintech, logistics, customer support, sales, or operations voice workflows.
  • Experience building dashboards for call review, QA, analytics, and human-in-the-loop correction.
  • Knowledge of prompt engineering, conversation design, and LLM evaluation.
  • Experience deploying systems on AWS, GCP, Azure, Railway, Render, or similar.
  • Familiarity with compliance-sensitive environments such as HIPAA, SOC 2, or GDPR is a plus.

What You'll Work On

  • AI agents that can make and receive phone calls.
  • Real-time conversation systems with low latency.
  • Voice workflows for scheduling, reminders, support, qualification, follow-ups, and data collection.
  • Integrations with internal tools and third-party platforms.
  • Evaluation systems to continuously improve agent reliability.
  • Infrastructure that can scale from demos to production usage.

Ideal Candidate

You are someone who has actually built voice agents or real-time conversational systems, not just chatbots. You understand that voice AI is different from text AI because users expect fast responses, smooth turn-taking, natural speech, and high reliability.

You are comfortable working across the full stack: audio streaming, LLMs, backend APIs, telephony, databases, and deployment. You enjoy debugging hard production issues and improving systems through fast iteration.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148317979

Similar Jobs

Gurugram, Gurugram, India

Skills:

MLopsLivekitPipecatvoice AI