Inspira Enterprise India is seeking a diligent and proactive Cloud/Infrastructure Monitoring Administrator L1 to join our operations team. This role is crucial for ensuring the continuous availability and performance of our diverse IT infrastructure, spanning Windows, Linux, VMware, and AWS environments. The ideal candidate will be adept at using monitoring tools like SolarWinds, Splunk, and Uptrends, performing initial troubleshooting, and ensuring timely escalation of issues in a 24x7 rotational shift model.
Key Responsibilities
- Monitoring & Alert Management:
- Continuously monitor a hybrid infrastructure, including Windows, Linux, VMware, and AWS environments, using specialized tools such as SolarWinds, Splunk, and Uptrends.
- Swiftly acknowledge, validate, and respond to alerts related to critical metrics like CPU utilization, memory consumption, disk space, service availability, and cloud instance health.
- Conduct basic troubleshooting steps, including ping tests, service status checks, validating console access, and reviewing VM statuses.
- Accurately escalate issues to L2/engineering teams as per defined Standard Operating Procedures (SOPs), ensuring efficient incident resolution flow.
- Cloud & Virtualization Support:
- Monitor AWS EC2 instance health, availability, and connectivity directly via the AWS Console.
- Validate vCenter alerts, perform checks on ESXi host connectivity, datastore usage, and ensure correct VM power status within VMware environments.
- Support incident triage specifically for newly migrated AWS environments, assisting in initial assessment and categorization of issues.
- Network & Connectivity Monitoring:
- Promptly respond to basic network-related alerts such as link down, high latency, and interface errors.
- Understand and perform basic networking troubleshooting, including ping, traceroute, DNS resolution checks, and physical link status verification.
- Utilize network monitoring data to accurately escalate WAN/ILL/ISP issues to internal network teams or external vendors as required.
- Ensure proper alert routing and verify that network device health is accurately reported on monitoring dashboards.
- Operational Tasks & Tools:
- Accurately create and update ServiceNow tickets for all reported incidents, outages, and routine monitoring tasks.
- Utilize iDRAC or equivalent remote management tools for remote access and to perform hardware health checks.
- Support patch window readiness by validating service status before maintenance and initiating reboots post-maintenance windows.
- Actively participate in shift handovers, meticulously maintain incident logs, and contribute to the team's knowledge base and documentation.
Preferred Candidate Profile
- Operating Systems: Basic hands-on experience with Windows Server and Linux OS for monitoring, basic administration, and troubleshooting.
- Virtualization: Exposure to VMware ESXi/vCenter and familiarity with virtual infrastructure environments.
- Cloud Fundamentals: Familiarity with AWS (console navigation, EC2 instance monitoring).
- Networking Fundamentals: Fundamental understanding of network technologies, including:
- TCP/IP, DNS, DHCP, ICMP, basic routing, and switching concepts.
- Proficiency with network troubleshooting commands like ping/traceroute and understanding packet path troubleshooting.
- Concept of LAN/WAN/Firewall/Load Balancer.
- Monitoring Tools: Experience using SolarWinds, Splunk, or similar infrastructure monitoring tools.
- Communication: Good written and verbal communication skills for effective reporting and collaboration.
- Availability: Willingness to work in a 24x7 rotational shift model, including weekends and holidays.
- Attention to Detail: Strong attention to detail and strict adherence to incident handling protocols.