
Search by job, company or skills
Role Overview
We are looking for a highly experienced Senior Incident Manager (Incident Commander) to lead and orchestrate high-severity incident response across large scale, distributed infrastructure environments.
This role requires a strong blend of technical depth, leadership capability, and real time decision-making under pressure.
Must-Have Skills (Non-Negotiable)
1. Incident Command & Leadership
- Proven experience handling P1 / P0 incidents in enterprise environments
- Ability to act as Incident Commander during critical outages
- Strong decision-making in high-pressure, multi-system failure scenarios
- Experience managing war rooms / bridge calls
2. Advanced Networking & Infrastructure Expertise
- Strong hands-on knowledge of:
BGP, OSPF, EIGRP
TCP/IP, Subnetting, QoS
WAN / SD-WAN / Data Center Networks
- Understanding of:
Spine-leaf architecture
Routing issues, latency, packet loss troubleshooting
- Exposure to:
Load balancing, DNS, DHCP, Network security
3. Cross-Functional Coordination
- Experience coordinating with:
Network teams
Infra / Cloud teams
Field Operations
- Ability to drive resolution across multiple teams simultaneously
4. Stakeholder Communication
- Strong communication skills with ability to:
Present updates to leadership / business stakeholders
Translate technical issues into business impact
- Clear, structured, and confident communication during incidents
5. Root Cause Analysis (RCA)
- Experience leading deep-dive investigations
- Ability to identify:
System failures
Network anomalies
Performance issues
- Strong documentation and reporting skills
6. Monitoring & Troubleshooting
- Experience with enterprise monitoring tools
- Ability to analyze:
Latency
Throughput
Availability
Proactive issue identification
7. Availability & Shift Flexibility:
- Willingness to work in:
24x7 environment
On-call rotations
- Ability to respond to incidents in real-time
Good to Have Skills:
- ITIL / Incident Management certifications
- Experience in large-scale global operations (NOC / Cloud / E-commerce)
- Knowledge of:
Disaster Recovery & Business Continuity
Automation / scripting (Python, etc.)
- Experience in process improvement / problem management
Key Responsibilities (Summary)
- Lead critical incident response and resolution
- Act as primary escalation point
- Drive technical triage and coordination
- Communicate with executive stakeholders
- Perform RCA and implement preventive measures
- Improve incident management processes
Ideal Candidate Profile:
- 8–15+ years of experience
- Background in:
Network Engineering / NOC / Cloud Ops
- Prior experience in: Verizon / Nokia/ Microsoft / large enterprise environments (preferred)
Job ID: 147065243