Data Scientist Recommender Systems
Location: Bengaluru (Hybrid)
Role Summary
We're seeking a skilled Data Scientist with deep expertise in recommender systems to design and deploy scalable personalization solutions. This role blends research, experimentation, and production-level implementation, with a focus on content-based and multi-modal recommendations using deep learning and cloud-native tools.
Responsibilities
- Research, prototype, and implement recommendation models: two-tower, multi-tower, cross-encoder architectures
- Utilize text/image embeddings (CLIP, ViT, BERT) for content-based retrieval and matching
- Conduct semantic similarity analysis and deploy vector-based retrieval systems (FAISS, Qdrant, ScaNN)
- Perform large-scale data prep and feature engineering with Spark/PySpark and Dataproc
- Build ML pipelines using Vertex AI, Kubeflow, and orchestration on GKE
- Evaluate models using recommender metrics (nDCG, Recall@K, HitRate, MAP) and offline frameworks
- Drive model performance through A/B testing and real-time serving via Cloud Run or Vertex AI
- Address cold-start challenges with metadata and multi-modal input
- Collaborate with engineering for CI/CD, monitoring, and embedding lifecycle management
- Stay current with trends in LLM-powered ranking, hybrid retrieval, and personalization
Required Skills
- Python proficiency with pandas, polars, numpy, scikit-learn, TensorFlow, PyTorch, transformers
- Hands-on experience with deep learning frameworks for recommender systems
- Solid grounding in embedding retrieval strategies and approximate nearest neighbor search
- GCP-native workflows: Vertex AI, Dataproc, Dataflow, Pub/Sub, Cloud Functions, Cloud Run
- Strong foundation in semantic search, user modeling, and personalization techniques
- Familiarity with MLOps best practicesCI/CD, infrastructure automation, monitoring
- Experience deploying models in production using containerized environments and Kubernetes
Nice to Have
- Ranking models knowledge: DLRM, XGBoost, LightGBM
- Multi-modal retrieval experience (text + image + tabular features)
- Exposure to LLM-powered personalization or hybrid recommendation systems
- Understanding of real-time model updates and streaming ingestion