JOB SUMMARY:
The Vice President – AIOps and System Reliability Engineering will be responsible for driving the strategy, automation, and transformation of enterprise IT operations through Agentic AI, autonomous operations, and Site Reliability Engineering (SRE).
Core Accountability:
Lead the creation of a self-healing, predictive, and highly automated IT environment by leveraging AIOps and AI-driven frameworks to:
- Achieve 100% elimination of L1 support
- Reduce L2 support effort by at least 50%
Key Focus Areas:
- Strategic Leadership: Define and execute enterprise-wide Autonomous IT Operations and Agentic AI roadmap.
- Operational Transformation: Implement AI-driven monitoring, anomaly detection, auto-triaging, intelligent remediation, and closed-loop automation.
- Infrastructure & SRE: Oversee Data Center/DR, hybrid cloud, Kubernetes, observability, SLOs/SLIs, error budgets, and reliability engineering.
- Innovation: Drive modernization of CI/CD, IaC, DevSecOps with embedded AI, and build a Center of Excellence for autonomous operations.
- Incident & Service Management: Automate incident management, root cause analysis, and ITSM (ServiceNow) processes with AI playbooks and self-service capabilities.
- People & Stakeholder Management: Lead SRE/automation teams, manage vendors, and engage with senior leadership on operational health and transformation progress.
Experience Required:
15+ years in infrastructure engineering, SRE, DevOps, and IT operations automation, with strong hands-on expertise in AIOps platforms, Agentic AI, cloud (AWS/Azure/OCI), Kubernetes, ServiceNow, and modern automation tools.