Responsibilities
- We are looking for 12+ years of experience in IT operations management with the key responsibilities below.
Key Responsibilities
Problem Management
- Own and manage the endtoend lifecycle of Problems, including identification, prioritization, investigation, root cause analysis (RCA), and closure.
- Facilitate major problem lookbacks following Sev1 and Sev2 incidents to identify systemic issues and prevent recurrence.
- Partner with engineering and platform teams to drive permanent corrective actions and track remediation progress.
- Analyze incident and problem trends to identify patterns, risk areas, and improvement opportunities.
- Ensure Problem records meet enterprise quality standards, including documentation, RCA completeness, and action tracking.
Major Incident Management
- Act as a Major Incident Manager during highseverity incidents, providing leadership, structure, and coordination across technical and business stakeholders.
- Ensure timely escalation, stakeholder communication, and executive visibility during major incidents.
- Guide teams through incident response processes to minimize business impact and mean time to restore (MTTR).
- Support postincident reviews and ensure lessons learned are incorporated into operational practices.
- Contribute to the continuous improvement of incident response models, severity definitions, and escalation processes.
Availability Management
- Monitor and manage the availability of critical applications and services, ensuring alignment with defined SLOs and availability targets.
- Produce and review availability, outage, and MTTR metrics, translating operational data into meaningful insights for technology and business leaders.
- Drive improvements in monitoring, alerting, and observability to enable proactive detection of service degradation.
________________________________________
Technical & Process Skills
- Deep working knowledge of Incident, Major Incident, Problem, Change, and Availability Management aligned to ITIL practices
- Proven capability to lead Sev1 / Sev2 incidents endtoend
- Strong understanding of severity definitions, escalation triggers, and IC engagement expectations
- Familiarity with SLO concepts and service reliability expectations
Professional Competencies
Communication & Influence - Communicates complex technical issues in clear, businessrelevant language
Incident Leadership & DecisionMaking - Maintains composure and clarity under pressure during highimpact incidents
Accountability & Ownership - Follows incidents and problems through to sustainable resolution
Collaboration & Partnership - Works effectively across global teams and time zones
Continuous Improvement Mindset - Actively identifies opportunities to improve.
Certifications
Qualifications
ITIL - Preferred
We are looking for 12+ years of experience in IT operations management with the key responsibilities stated in JD.