This position will be responsible to oversee the day-to-day operations of the NOC Engineer / IT Operations
department.
Duties and Responsibilities:
- 24/7/365 Monitoring: Continuously monitor internal and production systems to ensure service uptime and performance. Upon detecting issues or alerts, assess impact, troubleshoot for a first-level resolution, and escalate to Level 2 if required.
- Incident Response and Troubleshooting: Quickly diagnose and resolve server or network alerts, application issues, and infrastructure events using systematic troubleshooting and root cause analysis.
- Infrastructure Management: Support, manage, and deploy robust network and cloud infrastructure, ensuring high availability and scalability.
- Monitoring and Observability: Implement, maintain, and optimize monitoring tools such as SolarWinds, Dynatrace, Stackdriver (GCP), or other cloud-native solutions to gain insights into system performance and health.
- Documentation and Knowledge Sharing: Maintain up-to-date operations documentation, including runbooks, incident reports, support guides, and troubleshooting procedures.
- Shift Handover & Communication: Participate in effective shift handovers and maintain continuous communication across global teams to ensure operational continuity.
- Ticketing and Reporting: Manage issues and service requests through ticketing systems such as ServiceNow, Jira, or Service Desk. Ensure accurate status updates and timely resolution.
- Collaboration: Work closely with DevOps, Development and Infrastructure teams to ensure performance, reliability, and compliance across systems.
- Change Management: Participate in planned maintenance activities, patching, and deployments while adhering to change control procedures.
- System Administration: Provide support for Linux and Windows-based systems, ensuring system patches, performance tuning, and user access controls are in place.
- Flexibility: Willingness to work in shifts and provide support as per rotation for critical incidents.
- Communication Skills: Possess strong oral and written communication skills with the ability to constructively resolve conflicts and articulate technical issues to various audiences.
Knowledge, Skills and Abilities
- Operating Systems Expertise:Strong working knowledge of Windows and Linux operating systems, including administration of services like O365, and SCCM.
- Networking Proficiency:Good understanding of networking fundamentals such as LAN, WAN, TCP/IP, HTTP, FTP, DNS, DHCP, VPNs, and use of network utilities like SSH, CURL, and traceroute.
- Troubleshooting Skills:Excellent diagnostic and problem-solving abilities across PC hardware, operating systems, application layers, and network infrastructure.
- Scripting and Automation:Basic to intermediate scripting experience in Shell, Python, or Perl to automate routine tasks is an added advantage.
- Monitoring and Observability Tools: Hands-on experience with monitoring and alerting platforms like SolarWinds, Dynatrace, Stackdriver (or other cloud-native tools), with the ability to analyze and act on telemetry data.
- Communication and Collaboration: Strong written and verbal communication skills with the ability to clearly articulate technical information and collaborate effectively across cross-functional teams.
- Analytical and Organizational Abilities:Detail-oriented with strong analytical and time management skills, capable of handling multiple priorities with minimal supervision.
- Shift Flexibility:Willingness to work in a 24/7/365 operational environment, including night shifts, weekends, as needed.