
Search by job, company or skills
What You Will Be Doing
- Design and implement production-ready generative AI applications that serve millions of users, from initial architecture through deployment and monitoring
- Build advanced RAG (Retrieval-Augmented Generation) pipelines that combine vector databases, hybrid search, and intelligent caching to deliver sub-second response times
- Develop multimodal AI systems that seamlessly integrate text, vision, and audio capabilities using state-of-the-art models
- Architect scalable microservices that handle thousands of concurrent AI requests while optimizing for cost, latency, and reliability
- Lead code reviews and technical design sessions, establishing best practices and architectural patterns that elevate the entire team's capabilities
- Optimize large language models through fine-tuning techniques to achieve domain-specific performance improvements
- Implement comprehensive MLOps practices including automated testing, model versioning, A/B testing frameworks, and real-time monitoring dashboards
- Collaborate with product managers and stakeholders to translate complex business requirements into innovative AI solutions
- Deploy AI models across multiple cloud platforms (GCP) using containerization and orchestration technologies
- Create and maintain technical documentation, runbooks, and architectural decision records that enable knowledge sharing across teams
- Mentor junior engineers through pair programming, technical talks, and hands-on guidance to accelerate their growth
- Research and prototype emerging AI technologies to identify opportunities for competitive advantage
Gen AI Responsibilities
- Fine-tune and optimize state-of-the-art language models for specific business use cases, achieving significant improvements in accuracy and relevance
- Design multi-agent AI systems using frameworks to orchestrate complex workflows and decision-making processes
- Implement advanced prompt engineering strategies including Tree of Thoughts, ReAct patterns, and automatic prompt optimization to maximize model performance
- Build production-grade embedding systems that handle billions of vectors, implementing efficient indexing strategies and hybrid search capabilities
- Develop computer vision pipelines using models for tasks ranging from object detection to visual question answering
- Create secure AI applications with robust safeguards against prompt injection, jailbreaking, and data leakage while maintaining compliance with AI governance standards
- Optimize token usage and implement intelligent caching strategies to reduce costs by 50-70% while maintaining quality
- Design and implement evaluation frameworks that go beyond traditional metrics, incorporating human feedback loops and domain-specific quality measures
- Build real-time AI inference systems capable of processing streaming data with sub-100ms latency requirements
- Integrate multiple foundation models into unified applications, implementing fallback mechanisms and load balancing for high availability
- Develop custom tools and functions that extend LLM capabilities, enabling models to interact with databases, APIs, and external systems
- Implement advanced RAG techniques including contextual embeddings, cross-encoder reranking, and Graph RAG for complex reasoning tasks
- Create multimodal search systems that enable users to query across text, images, and documents using natural language
- Build AI-powered data processing pipelines that automatically extract, transform, and enrich unstructured data at scale
- Deploy edge AI solutions using frameworks like ONNX and TensorRT, optimizing models for resource-constrained environments
What We're Looking For
- 5+ years of hands-on experience building and deploying ML/AI systems, with at least 2+ years focused on generative AI and LLMs
- Expert-level Python programming skills with deep knowledge of async programming, multiprocessing, and performance optimization
- Strong experience with modern AI frameworks including PyTorch, Transformers, LangChain, and vector databases
- Proven track record of deploying AI applications to production environments serving real users at scale
- Deep understanding of transformer architectures, attention mechanisms, and the latest advances in generative AI
- Experience with cloud platforms (GCP) and containerization technologies (Docker, Kubernetes)
- Excellent communication skills with the ability to explain complex AI concepts to both technical and non-technical audiences
- Proven experience improving large-scale product search and discovery — including dense retrieval with bi-encoders, cross-encoder reranking, query understanding, and hybrid BM25 + vector search across catalogs of tens of millions of SKUs
- Hands-on experience building and deploying production multi-agent systems using orchestration frameworks such as LangGraph and Google ADK — designing stateful, tool-augmented agents for complex, real-world workflows
- Bachelor's degree in Computer Science, Mathematics, or related field (Master's preferred but not required with relevant experience)
Nice to Have
- Published research papers or significant contributions to open-source AI projects
- Experience with multimodal AI systems combining vision, language, and audio
- Domain expertise in specific verticals (healthcare, finance, legal, e-commerce)
- Knowledge of AI safety, alignment, and constitutional AI principles
- Experience building AI infrastructure and platforms used by other engineers
- Familiarity with emerging technologies like neural architecture search, mixture of experts, or neuromorphic computing.
Job ID: 147506127
Skills:
MLops, Pytorch, Docker, Kubernetes, Python, LangChain, orchestration frameworks, multi-agent systems, generative AI, vector databases, Transformers, large language models, AI frameworks, cloud platforms GCP
We don’t charge any money for job offers