Key Responsibilities
Monitoring & Incident Response
- Monitor server and network infrastructure using dashboards and monitoring tools to detect anomalies or alerts
- Provide first-level response to service tickets, meeting 2-hour initial response SLA compliance
- Perform routine health checks on Cisco routers, switches, and server hardware
- Follow standardized operational procedures and runbooks for common incidents
Troubleshooting & Escalation
- Perform basic troubleshooting on Dell PowerEdge servers using iDRAC, RAID, and other remote management tools
- Escalate unresolved or complex issues to L2/L3 engineers with detailed handover notes
- Assist on-site technicians or vendors for hardware replacements, tracking all activities in ticketing systems
Maintenance & Operations Support
- Execute routine maintenance tasks such as scheduled reboots, configuration backups, and pre-approved firmware updates under supervision
- Maintain awareness of scheduled maintenance windows and monitor for unexpected impacts
- Contribute to continuous improvement of runbooks and documentation by identifying gaps and suggesting updates
Documentation & Communication
- Log all incidents, actions, and resolutions accurately in the ticketing system
- Provide clear status updates during shift handovers
- Document issues for knowledge management and escalation purposes