Major Incident Commander

antzlab technology services pvt. ltd.

Bengaluru, India

5-7 Years

Save

Posted 16 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

We are seeking a highly accomplished Principal Incident Commander / Director – Incident Management to lead enterprise-wide response to critical incidents across complex, large-scale, and globally distributed infrastructure environments.

This role operates at the intersection of technology leadership, crisis management, and business continuity, requiring the ability to make high-stakes decisions, influence senior stakeholders, and drive rapid resolution during mission-critical outages. The individual will serve as the ultimate authority during major incidents, ensuring minimal business disruption and long-term resilience.

RequirementsStrategic Responsibilities

Own and lead enterprise-level incident management strategy across global operations.
Act as the executive Incident Commander for P0/P1 incidents impacting business-critical systems.
Establish and drive incident governance frameworks, SLAs, and response protocols
Lead cross-functional crisis response involving Network, Cloud, Infrastructure, Security, and Field Operations
Influence and align with C-suite and senior leadership during high-impact incidents
Drive business continuity and service resilience initiatives

Operational Leadership

Command and orchestrate war rooms and global bridge calls with multiple stakeholders
Serve as the highest escalation point for critical outages and service disruptions.
Ensure rapid triage, containment, and resolution of incidents with minimal downtime
Drive real-time decision-making under ambiguity and pressure
Oversee post-incident reviews and enforce accountability across teams

Technical Expertise

Deep expertise in enterprise networking and distributed systems:
BGP, OSPF, EIGRP, TCP/IP, QoS
WAN, SD-WAN, Data Center architectures (Spine-Leaf)
Strong understanding of:
Load balancing, DNS, DHCP, Network Security
Latency, packet loss, and performance optimization
Familiarity with cloud platforms and hybrid infrastructure environments
Ability to engage in hands-on technical triage when required
Lead Root Cause Analysis (RCA) at an organizational level
Drive preventive engineering, automation, and process maturity
Establish a culture of proactive monitoring and early detection
Enhance incident response playbooks, runbooks, and training programs

Preferred Qualifications

ITIL Expert / Advanced Incident Management certifications
Exposure to Disaster Recovery (DR) & Business Continuity Planning (BCP)
Experience with automation, observability platforms, and AI-driven monitoring
Track record of driving transformation in incident management practices
5 -7 years of experience in Network Engineering, SRE, NOC, or Cloud Operations
Proven experience handling enterprise-scale, high-impact incidents globally
Prior experience in large enterprises / telecom / hyperscalers / global tech organizations
Strong leadership presence with the ability to influence without authority
Experience working in 24x7, mission-critical environments