Act as primary point of contact and coordinator for high-priority production incidents, ensuring timely resolution and clear communication.
Lead incident triage calls, drive root cause identification, and coordinate with application, infrastructure, and support teams to restore services within agreed SLAs.
Apply ITSM and ITIL best practices to manage the full incident lifecycle, including logging, classification, prioritization, escalation, and closure.
Maintain accurate and detailed incident records in ServiceNow (or similar ITSM tools), ensuring proper documentation of impact, actions taken, and resolution steps.
Communicate incident status, risks, and recovery plans to stakeholders and management through timely updates and post-incident summaries.
Facilitate post-incident reviews, identify root causes, and drive corrective and preventive actions to reduce recurrence of issues.
Monitor incident trends and key operational metrics, providing insights and recommendations for service improvement and stability.
Collaborate with change, problem, and service management teams to align incident processes with broader ITSM practices.
Support continuous improvement of incident management workflows, runbooks, and escalation matrices based on lessons learned.
Partner with support and engineering teams to enhance monitoring, alerting, and readiness for critical production systems. Minimum Qualifications:
Bachelor's degree in Engineering, Computer Science, Information Technology, or related field (B.Tech or equivalent).
35 years of hands-on experience in production support and incident management within an IT or technology environment.
Strong practical experience with ITSM frameworks and processes, including incident, problem, and change management.
Solid understanding of ITIL principles and their application in day-to-day operations and service management.
Proven experience acting as an Incident Manager, leading incident bridges and coordinating multiple technical teams.
Experience working with ServiceNow or similar ITSM tools for incident tracking, workflow management, and reporting.
Excellent communication, coordination, and stakeholder management skills, especially during high-pressure situations.