Search by job, company or skills

A

Incident Manager

8-12 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 9 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.

Business Title: Incident Manager

Role summary:


Help restore critical services with clarity, coordination, and strong operational judgment. The Incident Manager is responsible for leading incident response activities, reducing business impact, and coordinating cross-functional teams to restore services as quickly and safely as possible. This role is based in Pune, India and follows a hybrid work model. The person in this role will also help strengthen service management practices by reviewing incident trends, improving response readiness, and supporting operational continuity across infrastructure and platform environments. This position reports to the Director.

Team summary:


As an Incident Manager in the Cloud Infrastructure Engineering organization, you will be part of the Service Management team, which is focused on restoring impacted services to an operational state quickly while managing risk and ongoing business impact. The team coordinates incident response efforts, keeps teams aligned on service restoration priorities, and ensures that resolution remains the immediate focus until business impact has been addressed.

This role also contributes to the broader effectiveness of service management by reviewing historical incident data, identifying patterns, and helping define practical actions that can reduce recurrence or improve response readiness. The Incident Manager facilitates discussions across technical and business stakeholders to review incident trends, evaluate response effectiveness, and communicate findings, risks, and recommendations to senior management.

Success in this role requires a strong understanding of enterprise infrastructure, service dependencies, and production operations, along with the ability to guide structured discussions during time-sensitive situations. The role works closely with technical responders, operational stakeholders, and leadership to maintain alignment throughout the incident lifecycle, from identification and triage through recovery and follow-up. Because incidents can occur at any time, this role requires schedule flexibility, including availability across nights, weekends, and holidays, based on business need.

Essential Job Responsibilities:

  • Lead major incident response activities to restore impacted services quickly while balancing technical risk, business impact, and communication needs.
  • Manage incident bridges and cross-functional coordination across engineering, infrastructure, site reliability, and operations teams during high-priority events.
  • Assess service disruptions to determine severity, business impact, escalation requirements, and appropriate response actions.
  • Communicate timely and accurate incident updates to stakeholders, including technical teams, operational partners, and leadership.
  • Analyze historical incident data, response patterns, and service trends to identify improvement opportunities and recurring operational risks.
  • Drive post-incident follow-up by coordinating action tracking, handoffs to problem management or engineering teams, and review of operational learnings.
  • Maintain incident management processes, documentation standards, and service management practices that support operational consistency and readiness.
  • Collaborate with infrastructure and site reliability partners to improve service restoration procedures, escalation paths, and operational visibility.
  • Use AI tools, where appropriate, to improve incident management workflows, such as summarizing incident timelines, identifying recurring themes in operational data, drafting stakeholder communications, or comparing response patterns across incidents, while validating outputs carefully before use.

Additional Job Responsibilities:

  • Prepare incident summaries, trend reports, and operational dashboards for leadership review.
  • Support continuous improvement efforts related to service management, incident response, and operational governance.
  • Review incident records for completeness, quality, and alignment with established process expectations.
  • Partner with technical teams to identify gaps in monitoring, alerting, runbooks, or recovery documentation.
  • Contribute to readiness activities such as simulations, process walkthroughs, or response playbook updates.
  • Assist with service review discussions by highlighting risk areas, recurring patterns, and response improvement opportunities.
  • Coordinate with stakeholders during change-related or release-related incidents when additional operational alignment is needed.
  • Participate in on-call or flexible coverage expectations aligned to business and incident management needs.

Expected Education & Experience:

  • 8 to 12 years of experience in Incident Management, Service Management, IT Operations, Site Reliability, Infrastructure Operations, or a related operational technology function.
  • Experience leading incident response for complex production environments with multiple technical stakeholders and service dependencies.
  • Working knowledge of incident lifecycle management, service restoration practices, escalation models, and operational risk management.
  • Experience coordinating across infrastructure, platform, application, and operations teams during high-impact service events.
  • Ability to communicate clearly and effectively with both technical and non-technical audiences during time-sensitive situations.
  • Experience reviewing operational data and incident trends to identify recurring issues and support service improvement efforts.
  • Familiarity with service management frameworks such as ITIL and with structured documentation in incident or workflow management systems.
  • Understanding of enterprise infrastructure, distributed systems, availability concepts, and production support operations.
  • Bachelor's degree in Information Technology, Computer Science, Engineering, or a related field, or equivalent relevant experience.

-

More Info

Job ID: 145958443

Similar Jobs