A fast-scaling organization in the Enterprise AI & Machine Learning sector, focused on building production-grade Large Language Model (LLM) solutions, Retrieval-Augmented Generation (RAG) systems, and real-time intelligent search for B2B customers. The team delivers low-latency inference, scalable vector search, and robust MLOps for mission-critical applications.
Primary role title: Senior Machine Learning Engineer (LLM & RAG)
Location: Pune, India On-site
Role & Responsibilities
- Design, build, and productionize end-to-end LLM & RAG pipelines: data ingestion, embedding generation, vector indexing, retrieval, and inference integration.
- Implement and optimize vector search solutions using FAISS/Pinecone and integrate with prompt orchestration frameworks (e.g., LangChain).
- Optimize model serving for latency and cost: batching, quantization, ONNX/Triton deployment, and autoscaling on Kubernetes.
- Develop robust microservices and REST/gRPC APIs to expose inference and retrieval capabilities to product teams.
- Establish CI/CD, monitoring, and observability for ML models and pipelines (model validation, drift detection, alerting).
- Collaborate with data scientists and platform engineers to iterate on model architectures, embeddings, and prompt strategies; mentor junior engineers.
Skills & Qualifications
Must-Have
- PyTorch
- Hugging Face Transformers
- LangChain
- Retrieval-Augmented Generation
- FAISS
- Pinecone
- Docker
- Kubernetes
Preferred
- Triton Inference Server
- Apache Kafka
- Model quantization
Qualifications: 69 years of hands-on experience in ML/LLM engineering with a strong track record of shipping production ML systems. Comfortable working on-site in Pune. Strong software engineering fundamentals and experience collaborating across product and data teams.
Benefits & Culture Highlights
- Opportunity to lead end-to-end LLM projects and shape AI product direction in a growth-stage engineering team.
- Collaborative, fast-paced environment with mentorship, tech ownership, and exposure to modern MLOps tooling.
- Competitive compensation, professional development budget, and on-site engineering culture in Pune.
To apply, bring strong LLM production experience, demonstrable RAG implementations, and a bias for scalable, maintainable systems. Join an engineering-first team building the next generation of AI-powered enterprise features.
Skills: llm,rag,ai/ml