Position Overview
Job Title: Site Reliability Engineer - VP
Location: Pune, India
Role Description
- We are seeking a Site Reliability Engineer – Observability to build and scale our enterprise observability capability. This role focuses on instrumentation, monitoring, and telemetry platforms to provide end-to-end visibility across services.
- Own and drive enterprise-wide reliability governance, ensuring systems operate with consistent SLO standards, strong production controls, and audit-ready processes. Act as the central control tower for reliability across all platforms.
What We'll Offer You
As part of our flexible scheme, here are just some of the benefits that you'll enjoy
- Best in class leave policy
- Gender neutral parental leaves
- 100% reimbursement under childcare assistance benefit (gender neutral)
- Sponsorship for Industry relevant certifications and education
- Employee Assistance Program for you and your family members
- Comprehensive Hospitalization Insurance for you and your dependents
- Accident and Term life Insurance
- Complementary Health screening for 35 yrs. and above
Your Key Responsibilities
Reliability Governance
- Define and own enterprise SLO/SLI framework aligned to service criticality
- Establish and enforce error budget governance policies
- Standardize reliability KPIs and reporting
- Production Controls
Define PRR / Production Certification (PRC) standards
- Observability coverage (metrics, logs, traces)
- Alert quality (actionable, low-noise)
- Runbooks & recovery readiness
- Govern release readiness across teams
- Incident Governance
Own incident management framework (severity, escalation, response)
- Define RCA standards, SLAs, and quality benchmarks
- Ensure traceability (alert → incident → RCA → remediation)
- Oversee major incidents and systemic risks
- Risk & Audit Alignment
Drive adoption of SRE practices across engineering
- Provide frameworks, playbooks, and guidance
- Conduct reliability reviews with leadership
- Skills & Experience
Strong SRE / production engineering leadership experience
- Expertise in SLOs, error budgets, incident governance, observability
- Experience with distributed systems, cloud, and Kubernetes
- Strong understanding of risk, audit, and compliance (financial services preferred)
- Own and enforce reliability as a governed, measurable, and audit-ready capability across the enterprise.
Your Skills And Experience
- Strong understanding of metrics, logs, traces correlation
- Programming: Python, Linux
- Familiarity with monitoring tools.
How We'll Support You
- Training and development to help you excel in your career
- Coaching and support from experts in your team
- A culture of continuous learning to aid progression
- A range of flexible benefits that you can tailor to suit your needs
About Us And Our Teams
Please visit our company website for further information:
https://www.db.com/company/company.html
We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively.
Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group.
We welcome applications from all people and promote a positive, fair and inclusive work environment.