Lead end-to-end production support for critical applications, ensuring high availability, performance, and reliability across distributed environments.
Own and manage major incidents, including impact assessment, triage, coordination with multiple teams, communication, and timely resolution.
Analyze recurring incidents and production issues to identify root causes and drive permanent fixes and preventive measures.
Define and implement production support standards, runbooks, and operational procedures aligned with ITIL best practices.
Collaborate with development and architecture teams to design supportable, scalable, and resilient solutions for new and existing applications.
Oversee change, release, and deployment activities to minimize production risk and ensure smooth transitions into live environments.
Monitor system health using appropriate tools, define alert thresholds, and proactively address performance and capacity issues.
Provide technical leadership and guidance to support teams, including mentoring, knowledge sharing, and continuous skills enhancement.
Partner with business stakeholders to understand service expectations, prioritize issues, and ensure adherence to SLAs and OLAs.
Prepare and present incident reports, trend analyses, and improvement recommendations to senior management and key stakeholders. Minimum Qualifications:
Bachelor's or Master's degree in Computer Science, Information Technology, or related field (BTECH, BE, ME, MSC, BSC, BCA, MCA, MTECH).
8–15 years of hands-on experience in production support or application support roles within enterprise environments.
Strong expertise in production operations and support processes, including incident, problem, and change management.
Proven experience applying ITIL principles in day-to-day production support and service management activities.
Demonstrated ability to troubleshoot complex issues across distributed systems and coordinate resolution with multiple teams.
Solid understanding of service-level management, monitoring, and escalation procedures in high-availability environments.