Roles & Responsibilities:
L3 System Operations and Support
- Act as the primary Level 3 escalation point for complex incidents affecting hosted systems, services, servers, and network devices.
- Perform proactive and reactive health checks, diagnostics, and high-level troubleshooting to ensure continuous availability and performance.
- Conduct Root Cause Analysis (RCA) for major incidents and implement permanent remediation actions.
- Manage incident and problem workflows in Freshservice ITSM platform with clear documentation and communication.
- Adhere to ITIL-aligned processes for Incident, Problem, and Change Management.
Advanced System Administration
- Provide expert system administration for enterprise hosting in on-prem and cloud environments.
- Perform troubleshooting, patch administration, installation, and remote system/network administration.
- Focus on automation and AI-driven infrastructure management (Infrastructure as Code, scripting).
- Configure, maintain, and optimize core services such as Proxy servers and web servers (e.g., Apache).
- Manage M365, Azure, and other core service admin centers.
Monitoring and Infrastructure Maintenance
- Design, configure, and maintain enterprise monitoring solutions, especially Zabbix, for servers, network devices, and critical services.
- Lead technical improvement projects, including OS and package patching, system upgrades, and capacity planning.
- Provide technical support and knowledge transfer to L1 and L2 Service Desk/Operations teams.