Search by job, company or skills

Talent Toppers

Vice President - Site Reliability Engineering & AI Ops

new job description bg glownew job description bg glownew job description bg svg
  • Posted 12 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Experience - 15+ Years

The Vice President AIOps and System Reliability Engineering is a visionary technology leader responsible for driving the strategy, automation, and transformation of the enterprise IT operations landscape.

Strategic & Autonomous Operations Leadership

o Define and execute a long-term strategy for Agentic AI, autonomous operations, and AI driven service management.

o Build and operationalize an enterprise-wide framework for Autonomous IT Operations (AIOps), ensuring seamless integration with infrastructure, cloud, and SRE functions.

o Lead the implementation of AI agents, decisioning engines, and self-healing automation across service operations.

Operational Excellence & Automation Transformation

o Achieve 100% elimination of L1 support through predictive automation, intelligent routing, and autonomous resolution workflows.

o Deliver 50% reduction in L2 support workload through AI based diagnostics, automated remediations, and knowledge orchestration.

o Oversee implementation of AI driven monitoring, anomaly detection, auto triaging, and automated incident remediation.

o Optimize IT operations through AIOps, observability platforms, and closed loop automation.

Infrastructure Engineering & SRE Leadership

o Oversee all DC/DR operations including servers, storage, databases, networking, and hybrid cloud infrastructure.

Innovation & Technology Modernization

o Identify, evaluate, and implement emerging technologies including Agentic AI, GenAI copilots, predictive operations, and advanced automation frameworks.

o Lead modernization of CI/CD, IaC, and DevSecOps with embedded AI and smart orchestration.

o Build a center of excellence for autonomous operations and AI first service engineering.

Incident, Problem & Change Management Automation

o Deploy automated playbooks, AI guided root cause analysis, and recommendation engines.

o Implement self-service and conversational AI capabilities across ITSM platforms.

o Ensure proactive detection (MTTD < 5 minutes) and rapid recovery (MTTR < 1 hour) through automation.

People & Vendor Leadership

o Lead and mentor engineering, SRE, and automation teams with a strong culture of innovation and accountability.

Required Skills & Experience:

Bachelor's degree in engineering, Computer Science, or related field (B.E./BTech preferred).

15+ years of progressive experience in infrastructure engineering, SRE, DevOps, and IT operations automation.

Strong hands-on experience with AIOps platforms, Agentic AI models, and autonomous operations frameworks.

Proven background in large scale IT modernization, observability, and reliability engineering.

Expertise in cloud operations (AWS/Azure/OCI), Kubernetes, container orchestration, and IaC.

Deep understanding of AI/ML, automation platforms, scripting (Python/Shell), and integration pipelines.

Experience with ITSM platforms, incident automation, and workflow orchestration (ServiceNow preferred).

Strong leadership capabilities with experience in driving major automation transformations.

Strong hands-on experience with AIOps platforms, Agentic AI models, and autonomous operations frameworks.

Proven expertise in managing large-scale, distributed systems with a focus on scalability, reliability, and security.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 143008837