Job Title:
AZURE-TECH-ARCHITECTURE AI Infrastructure (Private Cloud)
Location:
Bangalore / Pune, India
Experience:
15+ Years
Employment Type:
Permanent
Role Overview
We are seeking an experienced Lead Solutions Architect with deep expertise in AI/ML infrastructure, High Performance Computing (HPC), and container platforms. This role focuses on architecting, deploying, and optimizing enterprise-grade private cloud AI environments capable of supporting large-scale AI workloads.
The ideal candidate will bring strong technical leadership across AI infrastructure, Kubernetes platforms, hybrid cloud environments, and GPU-accelerated systems, and will play a key role in delivering scalable, secure, and high-performance AI solutions.
Key Responsibilities
Leadership & Strategy
- Act as the lead design authority for enterprise AI and container platforms
- Provide delivery assurance across Private Cloud AI and AI Factory solutions
- Align architecture with modern AI infrastructure principles including modular scalability, GPU optimization, and hybrid cloud orchestration
- Oversee planning, risk management, and stakeholder alignment across the project lifecycle
Solution Planning & Design
- Architect end-to-end solutions across container orchestration and HPC workload management
- Design and optimize platforms using Kubernetes, OpenShift, Rancher, Slurm, and PBS Pro
- Ensure seamless integration with AI platforms, DevOps/MLOps tools, and open-source frameworks
Opportunity Assessment
- Lead technical responses to RFPs, RFIs, and customer engagements
- Conduct Proof of Concept (PoC) activities to validate performance and scalability
- Assess customer infrastructure and recommend optimal reference architectures
Innovation & Research
- Stay current with emerging trends in AI infrastructure, HPC, Kubernetes, hybrid cloud, and security
- Contribute to solution innovation and best practices
Customer-Centric Engagement
- Act as a trusted advisor to enterprise customers
- Translate complex technical designs into clear business value
Team Collaboration
- Work closely with infrastructure, cloud, networking, and data science teams
- Mentor architects and consultants and contribute to internal knowledge-sharing initiatives
Required Skills
HPC & AI Infrastructure
- Extensive experience with HPC environments and workload schedulers (Slurm, PBS Pro)
- Expertise in HPC cluster management tools
- Strong understanding of high-speed networking (InfiniBand, Ethernet) and performance tuning
- Experience with GPU platforms and monitoring tools
Containerization & Orchestration
- Hands-on experience with Docker, Podman, Singularity
- Proficiency in at least two of the following:
- Kubernetes (CNCF)
- Red Hat OpenShift
- SUSE Rancher
- RKE / K3s / Charmed Kubernetes
- Experience with GPU Operator and GPU monitoring tools
Operating Systems & Virtualization
- Strong Linux administration experience (RHEL, SLES, Ubuntu)
- Expertise in system performance tuning and troubleshooting
- Experience with KVM and container-based virtualization
Cloud, DevOps & Architecture
- Strong understanding of architecture patterns and styles
- Experience with Infrastructure as Code, CI/CD, and automation tools
- Exposure to microservices, application modernization, and hybrid cloud architectures
Nice to Have
- Experience creating architecture diagrams and technical design documents
- Exposure to MLOps pipelines and AI lifecycle management