Role Summary:
As a Mid-Level Infrastructure Manager, you will be responsible for the lifecycle management of our private cloud environment built on VMware Cloud Foundation (VCF). You will bridge the gap between traditional server/database administration and modern cloud-native operations. A key focus will be preparing our on-premises infrastructure to host AI/ML workloads, GPU-accelerated clusters, and Kubernetes (Tanzu/VKS) to support our evolving S/4HANA and Utility operations.
Key Responsibilities
- VCF & SDDC Management: Administer the full VMware Cloud Foundation stack, including vSphere, vSAN, and NSX. Perform lifecycle management (LCM) using SDDC Manager to ensure the environment stays compliant and patched.
- Kubernetes & Containers: Deploy and manage Kubernetes clusters (VMware vSphere with Tanzu/VKS). Support the containerization of auxiliary utility services and integration with CI/CD pipelines.
- AI & GPU Orchestration: Configure and manage NVIDIA vGPU profiles within the virtual environment. Ensure high-performance compute resources are available for AI model inference and data-heavy utility analytics.
- Database Infrastructure: Support the underlying infrastructure for SAP HANA and other enterprise databases, ensuring optimal compute and storage alignment for high-availability (HA) and disaster recovery (DR).
- Automation & IaC: Shift from manual provisioning to Infrastructure as Code (IaC) using tools like Ansible, Terraform, or VMware Aria Automation to streamline server deployments.
- Monitoring & Optimization: Use VMware Aria Operations (formerly vROps) to monitor system health, specifically tracking GPU utilization and container pod performance.
Experience & Skill Requirements
- Virtualization: Deep hands-on experience with VMware vSphere 8.x. Proven experience in at least one VCF deployment or significant upgrade.
- Servers & Storage: Experience with enterprise-grade hardware (e.g., Dell VxRail, HPE Synergy) and Software-Defined Storage (vSAN).
- Databases: Solid understanding of infrastructure requirements for SAP HANA (TDI standards), including memory management and persistent storage.
- Kubernetes (K8s): Experience with Tanzu or vanilla K8s. Ability to troubleshoot Pods, Nodes, and Ingress controllers.
- GPU Management: Familiarity with NVIDIA AI Enterprise or vGPU software. Understanding how to allocate fractional GPU resources to VMs.
- Linux Mastery: Strong proficiency in RHEL or SLES (the backbone of S/4HANA and AI containers).
Preferred Certifications
- VMware Certified Professional (VCP - Data Center or Cloud Management).
- Certified Kubernetes Administrator (CKA).
- NVIDIA Certified Associate (AI Infrastructure).