Manager Site Reliability Engineering

Sabre

Bengaluru, India

8-10 Years

Save

Posted 9 days ago
Be among the first 10 applicants

Early Applicant

Job Description

About Sabre

Powering the agentic revolution in travel. Sabre is an AInative technology leader, backed by one of the world's largest travel data clouds. Built on an open, modular, cloudnative architecture, Sabre serves as the backbone for both established leaders and bold, new disruptorsguiding them to the next age of travel retailing through intelligent, connected, and personalized experiences. With AI at its core and operating at unparalleled scale, Sabre transforms insights into innovation, empowering airlines, hoteliers, agencies, and other partners to retail, distribute, and fulfil travel worldwide.

This role requires a strong blend of people leadership, stakeholder management, technical depth, and communication excellence to deliver reliable platforms and measurable business outcomes.

Team Description

The Connectivity SRE team is responsible for the reliability, availability, performance, and cost efficiency of missioncritical connectivity platforms operating across hybrid and cloud environments (GCP: GKE/GCE). The team partners closely with Engineering, Product, Network/Infrastructure, Security, Capacity, and external vendors to ensure resilient services that support Sabre's core business.

Role Summary

As a Site Reliability Engineering Manager, you will lead a globally distributed SRE team responsible for the reliability and operational excellence of missioncritical connectivity platforms and applications. You will balance people leadership with operational ownership, technical oversight, and crossregional collaboration.

This is a hands-on leadership role focused on reliability engineering and SRE maturity, owning oncall strategy, incident leadership, SLO/error budgets, disaster recovery readiness, observability, toil reduction, security compliance, and cost optimization, while driving crossfunctional execution and continuous improvement.

Key Responsibilities

Own production reliability for connectivity services, including SLO and errorbudget management, proactive production health monitoring, and continuous improvement.
Lead 24x7 oncall operations and major incident response, including rotation design, escalation paths, incident leadership, and blameless postincident reviews.
Own operational execution and work intake, including prioritization, assignment, and tracking of work items (e.g., Jira/Rally) to ensure timely and reliable delivery.
Ensure systems are secure, compliant, and resilient, including OS/platform patching, vulnerability remediation, configuration compliance, and PCI audit readiness, in partnership with Security and Compliance teams.
Maintain disaster recovery readiness, including RTO/RPO posture, testing cadence, and remediation of identified DR gaps.
Drive SRE best practices, including observability (metrics, logs, traces), alert hygiene, automation, toil reduction, and standardized runbooks and readiness reviews.
Own production change governance, including review and approval of changes (e.g., ServiceNow), ensuring appropriate risk assessment, rollback plans, and crossteam coordination to prevent production impact.
Collaborate with engineering teams to embed reliability by design into architectures, releases, and change management practices.
Lead, coach, and develop a globally distributed SRE team, establishing clear ownership models, supporting career growth, and fostering a culture of accountability and continuous learning.
Act as the primary SRE partner for Engineering, Product, Network/Infrastructure, Security, Capacity, and key vendors, driving crossfunctional initiatives such as modernization efforts, DR drills, observability improvements, and cost/capacity optimization.

Qualifications

Required

8-10+ years of experience in SRE, DevOps, or Infrastructure Engineering roles;
3+ years of experience as a people manager, leading engineers or SRE teams.
Proven experience running 247, largescale production systems with strong incident management and oncall leadership.
Handson experience with GCP (or other major cloud), Kubernetes/GKE, Linux, and networking fundamentals.
Strong depth in monitoring and observability (e.g., Grafana, Splunk, AppDynamics, or equivalents) and reliability governance (SLOs, error budgets).
Strong stakeholder management skills with ability to communicate clearly with senior engineering and business partners.
Bachelor's degree in Computer Science, Engineering, or equivalent experience.

Preferred

Experience leading cloud migrations and platform modernization initiatives.
Demonstrated outcomes in cost optimization and capacity planning.
Familiarity with CI/CD pipelines and changerisk controls in highavailability or regulated environments.
Experience supporting security compliance and audit requirements (e.g., PCI).
Experience leading or collaborating with globally distributed teams across multiple time zones