About AiSensy
AiSensy is a WhatsApp based Marketing & Engagement platform helping businesses like Adani, Delhi Transport Corporation, Yakult, Godrej, Aditya Birla Hindalco, Wipro, Asian Paints, India Today Group, Skullcandy, Vivo, Physicswallah, and Cosco grow their revenues via WhatsApp.
- Enabling 210,000+ Businesses with WhatsApp Engagement & Marketing
- 800 Crores+ WhatsApp Messages exchanged between Businesses and Users via AiSensy per year
- Working with top brands like Delhi Transport Corporation, Vivo, Physicswallah & more
- High Impact as Businesses drive 25-80% Revenues using AiSensy Platform
- Mission-Driven and Growth Stage Startup backed by Marsshot.vc, Bluelotus.vc & 50+ Angel Investors
About the Role
You will own the ML systems behind AiSensy's conversational AI stack — serving 200,000+ SMBs across India. This is a deep-IC role with significant architectural influence. You will work directly with senior engineering leadership on systems currently being benchmarked against Intercom Fin and Chatbase.
This is not a research role. You will ship to production, own latency and cost budgets, and be measured on whether real bots stop failing.
Core Responsibilities
1. Conversational AI
- Own the end-to-end ML pipeline for Conversational AI: retrieval quality, tool-calling routing, guardrails, and response synthesis.
- Design and tune hybrid retrieval (BM25 + ColBERT + dense embeddings) on Vector Databases. Build retrieval quality gates that catch failures before they hit users.
- Work across the LangGraph + DSPy orchestration layer or any other orchestration frameworks on prompt isolation, capability discovery, tool-path selection, and structured-output reliability.
- Evolve the three-tier memory architecture (STM, summary, LTM) spanning Qdrant and Valkey VSS.
- Build guardrail systems (PII detection, advice boundaries, safety) that cleanly separate platform-absolute rules, overridable defaults, and bot-owner rules. No conflation of layers.
- Run rigorous offline and online evals; tie model quality to product KPIs (resolution rate, handoff rate, cost per conversation).
2. Behavioral ML
- Design and productionize representation-learning models that encode multi-turn conversational behavior into embeddings suitable for downstream clustering, retrieval, and personalization.
- Build multi-level behavioral segmentation pipelines — both cross-tenant behavioral archetypes and tenant-scoped business clusters — with incremental updates that stay fresh as new user data arrives.
- Partner with platform engineering on the feature infrastructure spanning raw event storage, behavioral feature computation, vector storage at scale, and low-latency online feature serving for real-time journey orchestration.
- Own the end-to-end ML lifecycle in production: training, batch and online inference, retrain cadence, drift detection, and safe rollback paths.
3. Platform & Production
- Take models from notebook to production on Amazon Bedrock (Nova), SageMaker, and the AWS stack (ECS, ECR, Kubernetes).
- Own latency, cost, and quality budgets — particularly for WhatsApp-scale conversational throughput across multi-tenant workloads.
- Write low-level design documents before implementation. No LLD = no start is a team-wide gate. You will also review LLDs from peers.
- Contribute to the evaluation framework that benchmarks us against Intercom Fin, Chatbase, and other category leaders.
What We're Looking For
Must-Have
- 5+ years building production ML systems, with at least 2 years focused on LLM / conversational AI.
- Deep familiarity with RAG systems — not just wiring them up, but diagnosing retrieval failures, tuning hybrid retrievers, and knowing when not to use RAG.
- Hands-on experience with LangGraph, DSPy, or equivalent LLM orchestration frameworks.
- Strong grasp of vector databases (Qdrant, pgvector, or similar) at non-trivial scale.
- Production experience with a major cloud ML platform (SageMaker preferred; Bedrock, Vertex AI, Azure ML acceptable).
- Solid foundation in classical ML: embedding models, clustering, contrastive learning, evaluation methodology.
- Ability to write LLDs that a senior backend engineer can review and build against.
- Python, Fast API
Strong Signals
- You can articulate when an LLM is the wrong tool — e.g., why reaching for RAG on a classification problem is a smell.
- You have shipped multi-tenant ML at SMB scale. Cost per tenant matters to you, not just model accuracy.
- You have tuned MuRIL, IndicBERT, or similar Indic-language models in production.
- Experience with behavioral / sequence modeling for user journeys.
- Full-stack comfort — you can argue about MongoDB vs ClickHouse partitioning, not just model architecture.
Nice-to-Have
- WhatsApp Business API / CPaaS background.
- Contributions to open-source ML tooling.
- Published work on retrieval, conversational AI, or behavioral ML.
How We Evaluate
We value first-principles thinking over pattern-matching. In our interview loop:
- Take-home submissions with runtime crashes are disqualifying. We expect what you submit to run.
- Proposing a generic solution (add RAG, throw an LLM at it) to a problem that does not need it is a signal we read carefully.
- We want engineers who decompose a problem, pick the lightest tool that solves it, and can defend that choice under scrutiny.