Overview
We are seeking a highly experienced and motivated Senior Manager, Infrastructure as a Service (IaaS) to lead IaaS engineering and operations across the Compute, Storage, and Backup Chapters. This role is accountable for delivering resilient, scalable, and costeffective IaaS platforms that support missioncritical business workloads.
The Sr. Manager will lead multiple crossfunctional teams, including Linux and Windows Operations, Patching Operations, Storage Operations, and Backup Operations, with responsibility for environments exceeding 10,000 servers. These teams operate using Site Reliability Engineering (SRE) principles, emphasizing reliability, automation, error reduction, and measurable service outcomes. The role also champions the adoption of modern Generative AI capabilities to improve operational efficiency such as intelligent remediation, insight driven capacity planning, automated documentation, and accelerated incident analysis.
This role supports the execution of timebound data center exit and workload migration programs, requiring tight coordination across Architecture, Engineering, Platform, and Application teams enterprise wide. The Sr. Manager owns execution of our IaaS Strategy with accountability for largescale platform modernization initiatives including infrastructure modernization and migrations.
Key Responsibilities Include
- Lead and develop IaaS teams across Linux/Windows Operations, Patching, Storage, and Backup, including hiring, coaching, performance management, and succession planning
- Apply SRE practices to infrastructure operations, defining reliability targets, error budgets, runbooks, and continuous improvement mechanisms
- Drive automation and self service using modern tooling and Generative AI capabilities to reduce incidents, accelerate recovery, and improve operational insights
- Support enterprise infrastructure modernization initiatives, including workload migrations, platform transitions, and resiliency improvements
- Ensure operational excellence through strong incident management, SLA/SLO adherence, patch compliance, data protection, and recovery readiness
- Partner with Finance and vendors to manage capacity forecasting, cost optimization, and vendor strategy
- Serve as an escalation point for infrastructure risk, availability, and resiliency
- Provide leadership and oversight for 24x7 IaaS operational support, ensuring platform stability, availability, and rapid issue resolution across compute, storage, and backup services
Qualifications
- 15+ years of progressive infrastructure experience, including several years in people leadership roles overseeing enterprise Compute, Storage, and Backup platforms in large scale environments (10,000+ servers).
- Proven leadership of data center exit programs, large scale migrations, and platform modernization initiatives (e.g., VMware workload migrations) executed on aggressive, businesscritical timelines.
- Strong understanding of Infrastructure as a Service (IaaS) operating models across on‑prem, hybrid, and cloud environments, with demonstrated success driving automation first service delivery.
- Experience applying Site Reliability Engineering (SRE) practices to infrastructure operations, including reliability targets, incident management, operational metrics, and continuous improvement.
- Demonstrated ability to lead 24x7 operational teams, ensuring platform stability, resiliency, and disciplined execution during periods of transformation.
- Strong partnership and communication skills, with a track record of collaborating across Architecture, Engineering, Platform, Application, Security, and Finance (FinOps) teams to deliver measurable operational and financial outcomes.