Roles and Responsibilities:
- Incident Management: Manage and coordinate incident response efforts, including assessing and prioritizing incidents, communicating effectively with stakeholders, and guiding response efforts.
- Recovery Process Management: Oversee the recovery process based on best practices and ITIL standards.
- Process Standardization: Ensure consistent application of the incident management process.
- Team Coordination: Oversee roles and responsibilities of team members, ensuring they understand and fulfil their roles.
- Incident Assessment and Prioritization: Assess situations, prioritize actions, determine the extent of incidents, identify necessary resources, and deploy personnel efficiently.
- Reporting: Create weekly and monthly reports based on client SLAs and KPIs. Conduct daily, weekly, and monthly health checks on incidents logged.
- Communication: Act as the central communication point for major incidents. Communicate with internal and external stakeholders to provide accurate and timely updates.
- Incident Response Procedures: Ensure well-documented incident response procedures are in place and followed.
- Service Improvement: Drive continuous service improvements of the process and ITSM incident module. Identify process improvement opportunities and lead efforts.
- Strong working experience in Application support & service management tools: ServiceNow, JIRA
- Strong knowledge of issue resolution & escalation practices
- Sound knowledge of any public cloud & Dynatrace & Grafana monitoring tools
Skills Required:
- Experience: Minimum of 10 years of experience in IT operations, incident management, and leadership roles, with at least 5 years of experience specifically in managing production operations on Google Cloud Platform.
- Technical Expertise: Strong technical background with expertise in Google Cloud Platform infrastructure, systems, and applications. Proficiency in troubleshooting and resolving complex technical issues on GCP.
- Leadership Skills: Proven ability to lead and manage a team, with excellent interpersonal and communication skills.
- Problem-Solving: Exceptional problem-solving skills and the ability to make quick, effective decisions under pressure.
- Process Improvement: Experience in implementing process improvements and best practices in GCP operations.
- Communication: Strong written and verbal communication skills, with the ability to convey technical information to non-technical stakeholders.
- Documentation: Proficiency in maintaining accurate records and generating detailed reports.
- Compliance and Security: Knowledge of IT compliance standards and security protocols, specifically related to GCP.
- Analytical Skills: Ability to conduct root cause analysis and develop preventive measures.
Preferred Qualifications
- Certifications: GCP Professional Cloud Architect, GCP Professional Cloud Security Engineer, ITIL, PMP, CISS