Total Experience -12+ years
Skills – Mandatory- Cloud Infrastructure & IT Operations
Skills - Primary -Cloud Infra Management, ITIL, Service Delivery
Work Location-Trivandrum/Kochi
Job Purpose
We are seeking an experienced Service Delivery Manager (SDM) with deep expertise in IT operations, cloud infrastructure management, and a forward-thinking vision for AI-driven modernisation. The SDM will serve as the single point of accountability for end-to-end service delivery governance, operational excellence, and strategic leadership of a cloud-hosted infrastructure support environment, ensuring high availability, reliability, and performance across all managed services.
The ideal candidate brings 12+ years of progressive experience in IT service delivery and operations, managing a production support team for a medium-sized enterprise, a hands-on understanding of cloud platforms (preferably Azure or AWS), a proven ability to manage 24x7 infrastructure support teams, and a strategic mindset to drive AI-led modernisation of support operations. This person must be proactive, excellent at communication, data-driven, commercially aware, and capable of managing senior stakeholder relationships with confidence.
Service Delivery & Governance
- Own end-to-end service delivery for cloud infrastructure and managed operations, acting as the single point of accountability for service performance.
- Define, negotiate, monitor, and enforce SLAs, SLOs, and KPIs across all service towers.
- Establish and govern ITIL-based processes, including Incident, Problem, Change, and Release Management.
- Conduct regular service reviews, Quarterly Business Reviews (QBRs), and governance cadences with client and internal stakeholders.
- Manage escalations proactively, driving resolution and stakeholder communication in real time.
- Track and improve CSAT, NPS, and operational satisfaction metrics across all delivery functions.
- Maintain and report on operational dashboards covering SLA compliance, incident volumes, MTTR, MTTD, and service health.
Cloud Infrastructure Management
- Oversee 24x7 cloud operations across Azure (primary), AWS, or GCP environments, ensuring high availability and reliability.
- Govern infrastructure health, including uptime, performance, capacity planning, disaster recovery readiness, and RTO/RPO compliance.
- Manage vendors, MSPs, and technology partners delivering infrastructure support services, holding them accountable to contractual SLAs.
- Ensure security compliance, vulnerability governance, patch management, and adherence to Information Security Management policies.
- Drive cloud cost optimisation and FinOps practices by monitoring usage, identifying waste, and enforcing right-sizing disciplines.
- Collaborate with engineering and architecture teams on cloud migrations, platform upgrades, and infrastructure modernisation initiatives.
IT Operations & Team Leadership
- Lead and mentor a cross-functional infrastructure support team covering L1, L2, and L3 support tiers.
- Manage shift-based 24x7 operations with well-defined escalation frameworks, on-call schedules, and coverage plans.
- Drive operational maturity through creation and maintenance of runbooks, playbooks, and standard operating procedures (SOPs).
- Track MTTR and MTTD trends; initiate and own Problem Management reviews for recurring incidents.
- Conduct Root Cause Analysis (RCA) for critical and recurring incidents, implementing preventive actions to reduce future outages.
- Champion a culture of accountability, continuous improvement, and proactive operations across the team.
Stakeholder & Commercial Management
- Manage client relationships at senior and executive levels, ensuring high confidence and transparency in service delivery.
- Prepare and present service performance reports, operational dashboards, and executive briefings.
- Participate in contract reviews, SOW negotiations, renewals, and change order management.
- Identify and drive account growth through expanded service offerings and proactive value demonstration.
- Coordinate with procurement, legal, and finance for commercial governance and vendor management activities.
AI-Led Modernisation & Innovation
- Define and drive an AI-led roadmap for infrastructure operations modernisation aligned with business goals.
- Champion adoption of AIOps platforms such as Dynatrace, ServiceNow AIOps to enable predictive incident detection and automated remediation.
- Explore and implement LLM-based assistants and intelligent bots for L1 support automation and knowledge management.
- Promote infrastructure-as-code (IaC) and GitOps practices to improve provisioning speed, consistency, and compliance.
- Drive shift-left strategies, moving the team from reactive support to proactive and predictive operational models.
- Partner with engineering and product teams to embed observability, self-healing, and reliability capabilities into cloud environments.
- Build a continuous improvement culture backed by operational data, AI-driven insights, and regular retrospectives.
Job Specification / Skills and Competencies
Must Have
- 12+ years of progressive experience in IT service delivery, operations management, or managed services.
- Proven experience managing cloud-hosted infrastructure environments - Azure (strongly preferred), AWS, or GCP.
- Strong working knowledge of ITIL v3/v4 - Incident, Problem, Change, and Service Level Management.
- Experience managing 24x7 infrastructure or NOC support operations with L1/L2/L3 team structures.
- Demonstrated ability to manage SLAs, run incident bridges, conduct war rooms, and own escalations end-to-end.
- Excellent stakeholder management and executive communication skills - able to present to C-suite audiences.
- Hands-on experience with ITSM platforms such as ServiceNow, Remedy, or equivalent.
- Strong analytical mindset - ability to build dashboards, interpret operational data, and drive improvement actions.
Good to Have
- Azure Administrator, AWS Solutions Architect, or GCP equivalent cloud certification.
- ITIL 4 Foundation or Managing Professional certification.
- Exposure to SRE principles - error budgets, SLOs, toil management, and reliability engineering practices.
- Experience evaluating or deploying AIOps or observability platforms.
- Familiarity with FinOps frameworks and cloud cost management tools.
- Knowledge of DevOps/DevSecOps practices, CI/CD pipelines, and IaC tooling.
- PMP, PRINCE2, or equivalent project management certification.
- To adhere to ISMS policies and procedures.
Tools & Technologies
Azure / AWS / GCP - Azure Monitor, Log Analytics, App Insights, Azure VMs, AKS, App Services, Azure AD/Entra ID, VNets, NSGs, Azure Firewall
ITSM Tools -ServiceNow, Remedy, Jira Service Management or equivalent
Monitoring - Datadog, Dynatrace, Grafana, OpsGenie, Azure Monitor
Automation / IaC - Terraform, Ansible, Azure DevOps, GitHub Actions, PowerShell, Python
AIOps / AI Tools - Dynatrace Davis AI, ServiceNow AIOps, Copilot for Azure or equivalent
Networking - TCP/IP, DNS, VPN, Firewall (Fortinet/Palo Alto/Cisco), VNets, NSGs, Load Balancers
Reporting -Power BI, Tableau, Excel - for operational dashboards and SLA reporting
Certifications Desired
- ITIL 4 Foundation or Managing Professional
- PMP / PRINCE2 / SAFe Agilist
- Microsoft Azure Administrator (AZ-104) or Azure Solutions Architect (AZ-305)
- AWS Certified SysOps Administrator or Solutions Architect
Work Model
- Primary work location: Thiruvananthapuram / Kochi - Hybrid model applicable.
- Work standard US/UK business hours, including availability for after-hours on-call escalations.
- Expected to participate in client governance calls across global time zones (US/UK shifts as required).
- Role demands full ownership of service delivery outcomes across the managed infrastructure portfolio.
- Adherence to Experion's Information Security Management policies and procedures is mandatory.