A day in your life at MKS:
We are seeking a seasoned Senior Manager – Infrastructure Operations to lead and oversee enterprise-level NOC operations and Major Incident Management (MIM), with extended ownership across Windows, Linux, Storage, and Backup environments.
This role requires a strong blend of technical expertise, ITIL process knowledge, and leadership capability to ensure high availability, service reliability, and operational excellence. The incumbent will act as a technical mentor, drive continuous improvement, and enable teams to become self-sufficient in decision-making and problem resolution.
You Will Make an Impact By:
- NOC & Major Incident Management
- Lead and manage 24x7 NOC operations, ensuring proactive monitoring and availability of infrastructure services.
- Own the Major Incident Management (MIM) process, including incident triage, stakeholder communication, escalation, and resolution.
- Drive fast restoration of services and minimize business impact during critical incidents.
- Conduct Post-Incident Reviews (PIR) and ensure corrective and preventive actions are implemented.
- ITIL & Service Management
- Implement and govern ITIL processes including:
- Incident Management
- Problem Management
- Change, Release & Deployment Management
- Continual Service Improvement (CSI)
- Ensure adherence to SLAs, OLAs, and KPIs.
- Drive process maturity and automation across operations.
- Infrastructure & Platform Oversight
- Provide governance and technical oversight across:
- Windows Server Administration
- Linux/Unix environments
- Storage & Backup solutions
- Ensure high availability, capacity planning, and performance optimization.
- Collaborate with engineering teams for infrastructure modernization initiatives.
- Cloud & Automation
- Hands-on experience and governance in AWS and/or Azure environments.
- Promote automation, scripting, and orchestration using tools such as PowerShell, Python, or similar.
- Focus on reducing manual intervention and improving operational efficiency.
- Leadership & Stakeholder Management
- Lead, mentor, and develop high-performing NOC and infrastructure teams.
- Enable team autonomy in troubleshooting, decision-making, and incident resolution.
- Act as a key stakeholder interface, providing regular updates to leadership and business teams during incidents.
- Drive a culture of accountability, collaboration, and continuous learning.
- Continuous Improvement & Governance
- Identify operational inefficiencies and drive continuous improvement initiatives.
- Analyze trends and implement preventive measures to avoid recurring incidents.
- Ensure compliance, audit readiness, and best practices adoption across infrastructure operations.
Key Skills and Expertise Required:
Technical Expertise
- Strong hands-on experience in:
- NOC Operations & Monitoring Tools
- Major Incident Management
- Windows Server Administration
- Linux/Unix systems
- Storage & Backup technologies
- Good exposure to AWS and/or Azure cloud platforms
- Proficiency in automation & scripting (PowerShell, Python, Shell scripting)
Process & Framework
- Strong understanding of ITIL framework (Incident, Problem, Change, Release, CSI)
- ITIL Certification (preferred)
Leadership & Soft Skills
- Proven experience in team management and leadership roles
- Excellent stakeholder management and communication skills
- Strong analytical and problem-solving skills
- Ability to perform effectively in high-pressure, critical incident scenarios
Preferred Qualifications
- Experience in handling enterprise-scale infrastructure environments
- Exposure to DevOps practices and tools
- Relevant certifications (e.g., AWS/Azure, ITIL, Microsoft, Linux)