Position summary: The Cloud Operations Manager will lead operations, maintenance, and strategic management of data center and hybrid infrastructure to ensure high availability, performance, security, and compliance. This role combines hands-on infrastructure leadership, cloud migration experience, and team management to support mission-critical environments.
Quick Highlights
- Lead a technical operations team supporting on-prem, private, and hybrid cloud environments.
- Drive cloud migrations, disaster recovery, automation, and operational excellence.
- Balance vendor, budget, security, and SLA responsibilities for reliable service delivery.
Responsibilities
- Lead & develop the team: Hire, coach, schedule, and grow the data center operations staff; run performance reviews and career planning.
- Run day-to-day operations: Ensure reliable, available, and high-performing servers, storage, network, power, and cooling systems.
- Set policy & standards: Create and enforce operational procedures, security controls, and compliance standards.
- Cloud & hybrid migrations: Plan and execute cloud and hybrid integration projects (on-prem, private cloud, Azure, AWS, Microsoft 365); work with cloud services and identity integrations (Azure AD, IAM, SAML).
- Monitoring & optimization: Monitor performance, capacity, and costs; tune systems using Prometheus, Grafana, ELK/EFK, Splunk, Nagios, or SolarWinds.
- Disaster recovery & BCP: Design, test, and maintain backups and recovery plans using tools like Veeam, Commvault, or Rubrik and cloud DR where appropriate.
- Procurement & asset lifecycle: Manage hardware procurement, installations, upgrades, decommissioning, and vendor coordination; experience with HPE, Dell EMC, NetApp, Pure Storage, and DCIM tools is beneficial.
- Vendor, budget & SLA management: Oversee vendor relationships, contracts, budgets, and SLAs to deliver cost-effective, reliable services.
- Security & compliance: Implement controls and collaborate with security teams on segmentation, firewalls (Palo Alto, Check Point), IAM, and endpoint protections.
- Automation & IaC: Drive automation to improve consistency and efficiency using Terraform, Ansible, Puppet, or Chef and integrate operational CI/CD where useful (Jenkins, GitLab CI).
- Virtualization & containers: Manage virtualization and container strategies (VMware vSphere, Hyper-V, KVM, Nutanix, Docker, Kubernetes EKS/AKS/GKE) and workload lifecycle.
- Networking & load balancing: Ensure resilient network architecture with Cisco, Juniper, Arista and load balancers (F5, HAProxy); manage SAN/NAS protocols (Fibre Channel, iSCSI, NFS, SMB).
- Continuous improvement: Evaluate new technologies and lead initiatives to increase resilience, automation, and operational efficiency.
Qualifications
Required
- Bachelor's in IT, Computer Science, Engineering, or equivalent practical experience.
- 8+ years in data center operations, infrastructure management, or IT operations.
- Proven experience with private and hybrid infrastructure and leading cloud migration efforts (Azure, AWS, Microsoft 365).
- Hands-on knowledge of servers, storage, networking, power/cooling, virtualization (VMware, Hyper-V, KVM) and enterprise storage.
- Experience with networking and security (Cisco, Juniper, Arista, Palo Alto, Check Point, F5).
- Familiarity with monitoring and automation tools (Prometheus, Grafana, ELK, Nagios, Ansible, Terraform).
- Backup and DR experience (Veeam, Commvault, Rubrik) and cloud DR planning.
- Scripting proficiency (PowerShell, Python, Bash) and Infrastructure as Code experience.
- Experience managing vendors, budgets, and contracts; strong stakeholder communication skills.
- Solid understanding of IT security, compliance, and risk management.
Preferred
- 10+ years in similar or large-scale enterprise roles.
- Relevant certifications: CDCP, ITIL, PMP, AWS/Azure/GCP cloud certs, VMware, Cisco.
- Deep hands-on experience with virtualization, enterprise storage, DCIM, and hybrid networking (VPN, Direct Connect/ExpressRoute).
- Experience applying IaC and CI/CD practices to operations and advanced observability/logging (ELK, Splunk, Prometheus/Grafana).
Preferred Skills & Certifications
- Data center certifications (CDCP), ITIL, PMP or cloud certs (AWS, Azure, GCP); vendor certs a plus (VMware, Cisco).
- Experience with backup, DR, and business continuity (Veeam, Commvault, Rubrik).
- Virtualization and enterprise storage expertise; SAN/NAS protocol knowledge.
- Familiarity with DCIM (Schneider, Nlyte) and hardware management (iLO, iDRAC, IPMI).
- Container orchestration experience (Docker, Kubernetes EKS/AKS/GKE).
- Strong scripting and automation skills (PowerShell, Python, Bash) and IaC tools (Ansible, Terraform).
- Monitoring/logging experience with Prometheus, Grafana, ELK, Splunk and network/security tooling knowledge.
Skills: it operations,cloud infrastructure,data center operations