Key Responsibilities
Advanced Troubleshooting & Incident Management
- Diagnose and resolve complex server and network incidents using Dell iDRAC, Redfish, and Cisco CLI tools
- Perform root cause analysis, participate in blameless post-mortems, and implement preventative actions
- Provide 24x7 on-call support for critical escalations to ensure rapid response for production systems
Server & Network Maintenance
- Execute firmware, BIOS, and driver updates on Dell PowerEdge servers following standardized procedures
- Manage IOS/NX-OS firmware and software updates on Cisco routers and switches with change management compliance
- Conduct network health audits, performance analysis, and recommend optimization measures
Hardware Lifecycle & Vendor Coordination
- Manage hardware break/fix procedures and coordinate with Dell support for warranty claims and on-site technician dispatch
- Assist in asset tracking, equipment provisioning, and hardware lifecycle management
- Support capacity planning with data-driven insights on infrastructure utilization trends
Team Mentoring & Knowledge Transfer
- Mentor L1 engineers, providing guidance on complex ticket resolution and technical skills
- Maintain and update operational runbooks, network diagrams, and technical documentation
- Collaborate with cross-functional teams (e.g., SRE, IT leadership) to refine monitoring dashboards and alert thresholds
Process & Automation
- Use automation tools such as Ansible and Python to reduce operational toil
- Maintain CMDB entries and ensure configuration backups are up-to-date