SDE - II (Conversational AI)

truefan ai

Gurugram, India

5-7 Years

Save

Posted 17 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

You will own the backend pipeline that brings our real-time avatars to life by orchestrating audio, video, and AI/ML components into a seamless, low-latency, multi-tenant system serving enterprise clients at scale. You will lead a small backend team and partner closely with the product manager, AI/ML engineers, conversational AI designers, and DevOps to translate product vision into a production-grade architecture.

This is a hands-on senior role with leadership scope. You will be the technical anchor for the pipeline, making High-Level Design (HLD) and Low-Level Design (LLD) decisions, writing production code in Python, integrating streaming media stacks, hardening the system for concurrency and multi-tenancy, and setting the bar for engineering quality. You are equal parts architect, builder, and mentor, someone who has shipped scalable systems before and knows what it takes to operate them reliably for enterprise customers.

Responsibilities

Pipeline Architecture and Ownership: Own the end-to-end real-time avatar pipeline, from media ingestion and ASR through LLM inference and TTS to lip-synced video streaming back to the client. Define the HLD and LLD, make build-vs-buy decisions, and ensure the pipeline integrates cleanly into the broader product architecture.
Streaming Media Engineering: Design and implement the audio/video streaming layer using technologies such as WebRTC, HLS, Live Kit, and FFmpeg. Optimise for low latency, jitter, packet loss, and bandwidth variability across enterprise network conditions.
Scalable and Multi-Tenant System Design: Architect the pipeline to handle multiple concurrent sessions across multiple enterprise tenants, with strict isolation, fair resource allocation, and predictable performance. Design for horizontal scalability, graceful degradation, and zero-downtime deployments.
AI/ML Pipeline Integration: Work alongside AI/ML engineers to integrate inference services (LLM, lip-sync, avatar rendering) into the pipeline. Drive decisions on model serving, batching, queuing, and GPU resource utilisation to balance latency, throughput, and cost.
Multi-Cloud Infrastructure Strategy: Design the pipeline to be cloud-agnostic and deployable across multiple cloud providers, with clear abstractions for compute, storage, networking, and GPU resources. Guide DevOps on infrastructure choices, GPU provisioning, container orchestration, and cost optimisation.
Observability, Logging and Dashboarding: Establish robust logging, metrics, tracing, and dashboarding standards across the pipeline. Define SLIs/SLOs, build alerting that catches issues before customers do, and create dashboards that give the team and stakeholders real-time visibility into system health and business metrics.
Team Leadership and Code Quality: Lead a small backend engineering team by setting technical direction, reviewing code, mentoring engineers, and raising the bar on engineering practices, including testing, documentation, code reviews, and on-call hygiene.
Stakeholder Collaboration: Partner with the product manager, AI/ML team, conversational AI designers, DevOps, and enterprise client teams to translate requirements into technical solutions, communicate trade-offs clearly, and ship on committed timelines.

Requirements

5 to 6 years of backend engineering experience, with a strong track record of building and operating scalable, production-grade systems for enterprise customers.
Prior experience working on conversational AI products is a must, including voice agents, chatbots, virtual assistants, or similar real-time dialogue systems.
Strong proficiency in Python, including async programming, performance profiling, and writing clean, maintainable production code.
Hands-on experience with real-time streaming media technologies such as WebRTC, HLS, LiveKit, and FFmpeg, including codecs, transport protocols, and media server architectures.
Deep understanding of distributed systems concepts: concurrency, multi-tenancy, queuing, caching, load balancing, fault tolerance, and consistency models.
Strong grasp of HLD and LLD, with the ability to design systems from first principles, document architecture clearly, and make sound trade-off decisions.
Experience designing and integrating with AI/ML inference pipelines, including awareness of GPU infrastructure, model serving patterns, and latency-sensitive workloads.
Working knowledge of multi-cloud deployment (AWS, GCP, Azure) and container orchestration (Docker, Kubernetes), enough to guide DevOps decisions even if not owning them directly.
Hands-on experience setting up logging, metrics, tracing, and dashboarding using tools such as Grafana, Prometheus, ELK, Datadog, or equivalents.
Demonstrated leadership experience leading a small team, mentoring engineers, and owning technical outcomes for a product area.
Strong communication skills, with the ability to articulate architectural decisions and trade-offs to both technical and non-technical stakeholders.

Preferred Qualifications

Prior experience shipping real-time voice or video AI products at enterprise scale.
Experience with multi-tenant SaaS architectures and enterprise compliance requirements such as data isolation, audit logging, SOC2 and ISO.
Familiarity with message queues and event streaming systems (Kafka, RabbitMQ, Redis Streams).
Experience optimising GPU utilisation, batching strategies, or model-serving frameworks (Triton, vLLM, Torch Serve).
Background in low-latency systems engineering, including sub-second end-to-end latency targets, network optimisation, and edge deployment.
Experience working in regulated industries such as BFSI, telecom, or healthcare, with high standards for accuracy, security, and compliance.

This job was posted by Nikhil Tewari from TrueFan AI.