Machine Learning Engineer Retrieval & Fine-Tuning
Location: Banglore
Experience: 3+ Years
About Pocket FM
Pocket FM, founded in 2018, is India's leading audio storytelling platform, transforming the way millions consume stories. Offering high-quality serialized content across genres such as Romance, Drama, Thriller, Fantasy, Sci-Fi, and Mythology in eight languages, Pocket FM has built a strong global presence with over 200 million listeners worldwide. With users spending an average of 120 minutes daily on the platform, it has emerged as one of the fastest-growing audio platforms, rapidly expanding its reach across the US, Europe, LATAM, and Southeast Asia.
Role Overview:
We are seeking a Machine Learning Engineer specializing in retrieval systems and model fine-tuning to join our team. In this role, you will architect and optimize retrieval-augmented generation (RAG) pipelines, build and maintain semantic search infrastructure, and fine-tune large language models and embedding models for domain-specific applications. You will work at the intersection of information retrieval and modern NLP, ensuring our AI systems surface the most relevant, accurate, and context-rich information to power intelligent products.
Key Responsibilities
- Design, build, and optimize end-to-end retrieval-augmented generation (RAG) pipelines for production applications.
- Develop and manage semantic search systems using vector databases, embedding models, and hybrid retrieval strategies (dense + sparse).
- Fine-tune large language models (LLMs) and embedding models on domain-specific datasets using techniques such as LoRA, QLoRA, PEFT, and full fine-tuning.
- Curate, clean, and prepare high-quality training datasets for fine-tuning, including synthetic data generation and data augmentation strategies.
- Implement advanced chunking, indexing, and re-ranking strategies to maximize retrieval precision and recall.
- Evaluate retrieval and generation quality using metrics such as MRR, NDCG, recall@k, faithfulness, and answer relevancy.
- Build and maintain experiment tracking workflows for fine-tuning runs, including hyperparameter sweeps and ablation studies.
- Optimize inference latency and cost for retrieval and generation components, including quantization, caching, and batching.
- Collaborate with product and domain teams to define retrieval requirements and integrate ML systems into user-facing features.
- Stay current with emerging research in retrieval, fine-tuning, and LLM optimization, and drive adoption of best practices.
Required Qualifications
- Bachelor's or Master's degree in Computer Science, Machine Learning, NLP, Information Retrieval, or a related field.
- 3+ years of professional experience in ML engineering with a focus on NLP, search, or retrieval systems.
- Hands-on experience building and deploying RAG pipelines or semantic search systems in production.
- Demonstrated experience fine-tuning LLMs or embedding models (e.g., using Hugging Face Transformers, OpenAI fine-tuning API, or Axolotl).
- Strong proficiency in Python and deep learning frameworks such as PyTorch or TensorFlow.
- Working knowledge of vector databases (Pinecone, Weaviate, Qdrant, Milvus, pgvector, or similar).
- Solid understanding of transformer architectures, attention mechanisms, tokenization, and embedding spaces.
- Experience with text preprocessing, chunking strategies, and document parsing for unstructured data.
- Familiarity with cloud platforms (AWS, GCP, or Azure) and GPU-accelerated training environments.
- Strong analytical skills with the ability to design rigorous evaluation frameworks for retrieval and generation quality.
Preferred Qualifications
- Experience with parameter-efficient fine-tuning methods (LoRA, QLoRA, Adapters, Prefix Tuning).
- Familiarity with RLHF, DPO, or other alignment and preference-based training techniques.
- Hands-on experience with advanced retrieval techniques: hybrid search, HyDE, query expansion, multi-hop retrieval, or agentic RAG.
- Knowledge of re-ranking models (cross-encoders, ColBERT) and learned sparse retrieval (SPLADE).
- Experience with knowledge graph integration or structured data retrieval alongside unstructured text.
- Familiarity with model quantization (GPTQ, AWQ, GGUF) and efficient serving frameworks (vLLM, TGI, TensorRT-LLM).
- Published research or open-source contributions in information retrieval, NLP, or LLM fine-tuning.
- Experience with evaluation frameworks like RAGAS, LangSmith, or custom LLM-as-judge pipelines.
You can get more updates, insights and everything behind the scenes at Pocket FM here - Pocket FM