Platform Design and Architecture: building and operating a highly available, scalable, modular AI platform using technologies such as Qdrant, Anyscale, and Ray to support LLM orchestration, vector search, and multi-agent frameworks.
Core Infrastructure Development: Build essential APIs and infrastructure to power conversational applications, AI agents, and analytics tools.
LLM Operational Solutions: Implement workflows for Large Language Models, including inference pipelines, fine tuning, caching, and evaluation for open-weight and hosted models.
Deployment & Performance Optimization: Deploy AI services on AWS with Kubernetes (EKS), Lambda, and ECS, ensuring scalability and resilience while optimizing vector databases and model runtimes for cost and performance.
Collaboration, Governance, & Mentorship: Partner with engineering teams, research teams to deliver production grade, self-healing, and performance-optimized services for AI/RAG pipelines , establish governance/security standards, and mentoring junior engineers in AI infrastructure best practices & reviews.
What We're Looking For (Minimum Qualifications)
8+ years of experience as Platform Engineer ( Site Reliability / DevOps Engineer) , with at least 3+ years in AI/ML platform development ( MLOps ).
Deep expertise in Python, with strong design and debugging skills.
Ability to work independently and lead complex projects with Excellent problem-solving, analytical, and communication skills.
Proficiency working with cloud platforms such as AWS, GCP, or Azure and familiarity with MLOps/AI DevOps tools like MLflow or Kubeflow, proficient in CI/CD , infrastructure as code (Terraform / CloudFormation).
Hands-on expertise with CI/CD pipelines, model observability, and incident response for AI/ML services.
Preferred Qualification
Experience implementing and optimizing Platforms supporting large language model (LLM) pipelines with frameworks such as LangChain, LlamaIndex, Hugging Face Transformers, or similar.
Hands-on knowledge of Scaling & Setting up Vector DB platforms such as Qdrant (or other vector DBs like Pinecone, Weaviate) for semantic search and embeddings management.
Exposure to MLOps tools, Ray.io , Anyscale or other distributed orchestration & inference frameworks.
Experience with developing and deploying containerized applications using Docker and Kubernetes, including Helm charts and automated scaling.
Understanding of LLMOps patterns — model registry, prompt versioning, and feedback loop