Key Responsibilities
Advanced Troubleshooting & Operations
- Diagnose and resolve escalated server and network incidents using Dell iDRAC, Redfish, and Cisco CLI tools
- Execute firmware, BIOS, and driver updates on Dell PowerEdge servers, ensuring minimal disruption
- Perform IOS/NX-OS firmware and software updates on Cisco routers and switches, following change management protocols
- Conduct hardware break/fix procedures, coordinating with Dell for warranty claims, parts, and technician dispatch
- Perform regular network health audits and performance analysis, identifying and mitigating potential bottlenecks
Collaboration & Team Support
- Mentor and guide L1 engineers, providing knowledge transfer and technical assistance
- Collaborate with SRE and IT teams to improve monitoring dashboards and alert thresholds
- Participate in blameless post-mortems for major incidents, driving root cause analysis and preventive actions
- Maintain and update operational runbooks, network diagrams, and technical documentation
Infrastructure Lifecycle & On-Call Support
- Support hardware lifecycle management, including provisioning, asset tracking, and vendor coordination
- Provide 24x7 on-call support for critical escalations, ensuring rapid incident response
- Assist in capacity planning, providing data-driven insights on infrastructure utilization and growth