Machine Learning Engineer

Pocket FM

Bengaluru, India

3-5 Years

Save

Posted 2 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Machine Learning Engineer Retrieval & Fine-Tuning

Location: Banglore

Experience: 3+ Years

About Pocket FM

Pocket FM, founded in 2018, is India's leading audio storytelling platform, transforming the way millions consume stories. Offering high-quality serialized content across genres such as Romance, Drama, Thriller, Fantasy, Sci-Fi, and Mythology in eight languages, Pocket FM has built a strong global presence with over 200 million listeners worldwide. With users spending an average of 120 minutes daily on the platform, it has emerged as one of the fastest-growing audio platforms, rapidly expanding its reach across the US, Europe, LATAM, and Southeast Asia.

Role Overview:

We are seeking a Machine Learning Engineer specializing in retrieval systems and model fine-tuning to join our team. In this role, you will architect and optimize retrieval-augmented generation (RAG) pipelines, build and maintain semantic search infrastructure, and fine-tune large language models and embedding models for domain-specific applications. You will work at the intersection of information retrieval and modern NLP, ensuring our AI systems surface the most relevant, accurate, and context-rich information to power intelligent products.

Key Responsibilities

Design, build, and optimize end-to-end retrieval-augmented generation (RAG) pipelines for production applications.
Develop and manage semantic search systems using vector databases, embedding models, and hybrid retrieval strategies (dense + sparse).
Fine-tune large language models (LLMs) and embedding models on domain-specific datasets using techniques such as LoRA, QLoRA, PEFT, and full fine-tuning.
Curate, clean, and prepare high-quality training datasets for fine-tuning, including synthetic data generation and data augmentation strategies.
Implement advanced chunking, indexing, and re-ranking strategies to maximize retrieval precision and recall.
Evaluate retrieval and generation quality using metrics such as MRR, NDCG, recall@k, faithfulness, and answer relevancy.
Build and maintain experiment tracking workflows for fine-tuning runs, including hyperparameter sweeps and ablation studies.
Optimize inference latency and cost for retrieval and generation components, including quantization, caching, and batching.
Collaborate with product and domain teams to define retrieval requirements and integrate ML systems into user-facing features.
Stay current with emerging research in retrieval, fine-tuning, and LLM optimization, and drive adoption of best practices.

Required Qualifications

Bachelor's or Master's degree in Computer Science, Machine Learning, NLP, Information Retrieval, or a related field.
3+ years of professional experience in ML engineering with a focus on NLP, search, or retrieval systems.
Hands-on experience building and deploying RAG pipelines or semantic search systems in production.
Demonstrated experience fine-tuning LLMs or embedding models (e.g., using Hugging Face Transformers, OpenAI fine-tuning API, or Axolotl).
Strong proficiency in Python and deep learning frameworks such as PyTorch or TensorFlow.
Working knowledge of vector databases (Pinecone, Weaviate, Qdrant, Milvus, pgvector, or similar).
Solid understanding of transformer architectures, attention mechanisms, tokenization, and embedding spaces.
Experience with text preprocessing, chunking strategies, and document parsing for unstructured data.
Familiarity with cloud platforms (AWS, GCP, or Azure) and GPU-accelerated training environments.
Strong analytical skills with the ability to design rigorous evaluation frameworks for retrieval and generation quality.

Preferred Qualifications

Experience with parameter-efficient fine-tuning methods (LoRA, QLoRA, Adapters, Prefix Tuning).
Familiarity with RLHF, DPO, or other alignment and preference-based training techniques.
Hands-on experience with advanced retrieval techniques: hybrid search, HyDE, query expansion, multi-hop retrieval, or agentic RAG.
Knowledge of re-ranking models (cross-encoders, ColBERT) and learned sparse retrieval (SPLADE).
Experience with knowledge graph integration or structured data retrieval alongside unstructured text.
Familiarity with model quantization (GPTQ, AWQ, GGUF) and efficient serving frameworks (vLLM, TGI, TensorRT-LLM).
Published research or open-source contributions in information retrieval, NLP, or LLM fine-tuning.
Experience with evaluation frameworks like RAGAS, LangSmith, or custom LLM-as-judge pipelines.

You can get more updates, insights and everything behind the scenes at Pocket FM here - Pocket FM