Incident Management
- Manage the incident management lifecycle, from identification to resolution, ensuring adherence to SLAs and minimizing business impact.
- Manage major incidents (P1/P2) with urgency, coordinating cross-functional teams to restore services as quickly as possible.
- Act as the central point of communication for all stakeholders during incidents, providing regular updates on status, impact, and resolution timelines.
- Ensure accurate documentation of incidents, including root cause analysis (RCA) follow up and post-incident reports.
24/7 Coverage
- Together with the Operations Command Center team, provide 24/7 support for incident response, including on-call responsibilities as part of a rotational schedule.
- Proactively monitor high-priority services and potential risks, taking preventative action where necessary.
- Develop and maintain escalation procedures to ensure critical incidents receive appropriate attention.
Process Optimization Improvement
- Continuously analyze the incident management process to identify opportunities for efficiency, speed, and accuracy improvements.
- Collaborate with problem management teams to address recurring incidents and implement permanent solutions.
- Deploy process enhancements to improve metrics like First Time Resolution and MTTR, KPIs, and dashboards to measure incident management performance.
Collaboration Leadership
- Foster strong relationships with internal teams (Global Technical ServiceDesk, Level 2 operations, Project teams, etc.) and external vendors to ensure streamlined communication during incidents.
- Drive incident-related meetings, including war rooms, service reviews, and RCA sessions.
- Train and mentor Operations Command Center team members and stakeholders on incident management best practices.
Qualifications
Required:
- Proven experience (5+ years) in incident management within a large-scale, high-tech enterprise environment.
- Strong understanding of ITIL/ITSM frameworks and processes.
- Experience managing major incidents (P1/P2) and coordinating resolution efforts across multiple teams.
- Familiarity with monitoring tools (e.g., Splunk, SolarWinds, Zabbix) and ticketing systems (e.g., ServiceNow, Jira).
- Strong leadership, decision-making, and problem-solving skills, with the ability to remain calm under pressure.
- Exceptional communication skills for liaising with both technical and non-technical stakeholders.
Preferred:
- ITIL v4 Certification (Foundation or higher).
- Experience with cloud environments (AWS, Azure) and DevOps methodologies.
- Understanding of automation tools and processes for proactive incident management.