Job Purpose
Designs and architect end-to-end AI Cloud platforms with a focus on security, cost-efficiency, and performance. This position involves direct client engagement to translate requirements into technical Solution, encompassing GPU infrastructure rightsizing and optimal model selection. We are looking for a cloud expert with a demonstrated ability to transition complex AI models from concept to large-scale production. The ideal candidate brings extensive experience in AI/Cloud ecosystems and a successful track record of architecting and managing production-grade, large-scale AI platforms.
Role Summary
Key Responsibilities
- Translate business requirements into scalable, high-performance AI/GenAI architectures featuring NVIDIA GPU clusters
- Design end-to-end AI Cloud and next-generation platforms optimized for deep learning workloads and distributed training.
- Architect HPC cluster topologies utilizing high-speed InfiniBand (NDR/HDR) and RoCE v2 interconnects for low-latency communication.
- Right-size platform components, including GPUs, CPUs, memory and NVMe storage for comprehensive client proposals.
- Architect distributed training and inference environments optimized for MPI frameworks and workload scheduling via Slurm.
- Desing scalable container orchestration platforms using Kubernetes and Kubeflow to manage AI workloads.
- Propose optimized inference strategies using vLLM, Triton, and TensorRT-LLM to meet specific latency and throughput KPIs.
- Should have experience on RAG systems and multi-agent orchestration frameworks like LangGraph and agentic ecosystems.
- Develop private AI cloud environments focused on data sovereignty and regulatory compliance, such as the India DPDP Act.
- Define integration strategies for LLMs and open-source models within existing enterprise data systems, APIs, and knowledge graphs.
- Establish reference architectures for CI/CD/CT pipelines and automated model retraining workflows to ensure reproducibility.
- Implement automation and observability frameworks for monitoring GPU utilization, performance tuning, and failure handling.
- Drive technical validation through Proof of Concept (PoC) engagements, focusing on scalability and performance benchmarks for LLM training.
- Establish Infrastructure-as-Code (IaC) practices to ensure reproducible and reliable cluster deployments.
- Collaborate with C-suite stakeholders and cross-functional teams to drive technical decision-making, innovation, and roadmap alignment.
Experience & Educational Requirements
Qualifications and Experience
EDUCATIONAL QUALIFICATIONS: (degree, training, or certification required)
BE/B-Tech or equivalent with Computer Science or Electronics & Communication
RELEVANT EXPERIENCE: 15 - 20 years of IT Experience with minimum 5 years in AI platform
Required Technical Skills
Core AI/ML Expertise
- Strong experience in Nvidia, Intel, Google GPU Architecture, InfiniBand
- Strong expertise in Kubernetes, Slurm and OpenShift
- Good experience in Python, PyTorch and TensorFlow
- Good knowledge on LangChain, LangGraph
- Deep understanding of Transformers, Attention mechanisms, Diffusion, MoE
- Knowledge of RLHF, Pinecone, FAISS, Chroma, OpenAI, VLLM
- Expertise in RAG and agentic AI workflows
- Knowledge of high-performance storage (Lustre, PFS, Object NVMe)
- Good Knowledge with NVIDIA architectures (Hopper, Blackwell)
Soft Skills
- Strong problem-solving and analytical thinking
- Excellent communication and stakeholder management
- Ability to influence leadership and drive strategic decisions
- Innovation mindset with focus on enterprise impact
Preferred Experience
- Currently in AI / Cloud Presales team
- Should be able to right size infra and choose right GPU model as per client requirement
- Hands-on with Python, vector DBs (Pinecone, FAISS, Chroma), and LLM APIs (OpenAI, Anthropic).
- Solid understanding of cloud-native architecture OpenStack, KVM, (Azure/AWS/GCP), microservices, Kubernetes, serverless, API gateways.
- Good knowledge on deep learning experience: CNNs, RNNs/LSTMs, Transformers, and attention mechanisms.
- Proficiency in Python for ML: NumPy, pandas, scikit-learn, and frameworks such as PyTorch or TensorFlow.
- Experience in integrating LLMs (GPT, Claude, Gemini, LLaMA, Mistral) into applications.
- Prompt engineering skills: zero-shot, few-shot, chain-of-thought, ReAct, and structured output patterns.
- Experience building RAG systems: document chunking, embedding models, vector search, and retrieval optimization.
- Understanding of AI agent patterns, tool use, and agentic workflows.
- Familiarity with Docker, CI/CD pipelines, and Git-based workflows.
- Strong communication, stakeholder management, and solution design skills.