IB CTO - Site Reliability Engineer (SRE), AVP
Position Overview
Job Title:IB CTO - Site Reliability Engineer (SRE), AVP
Location: Pune, India
Role Description
- Investment Banking is a technology-centric business, with an increasing move to real-time processing, an increasing appetite from customers for integrated systems, and access to supporting data. This means that technology is more important than ever for the business.
- The CARE Platform aims to increase the productivity of both Google Cloud and on-prem application development by providing a frictionless build and deployment platform that offers service and data reusability. The platform provides the chassis and standard components of an application, ensuring reliability, usability, and safety, and gives on-demand access to services needed to build, host, and manage applications on the cloud/on-prem.
- In addition to technology services, the platform aims to have compliance baked in, enforcing controls/security, reducing application team involvement in SDLC and ORR controls, enabling teams to focus more on application development and release to production faster.
- We are looking for aSite Reliability Engineer (SRE)to join a global team. This role will focus on ensuring the operational health, reliability, performance, and scalability of the CARE platform, encompassing GCP/on-prem infrastructure, application deployment, and the underlying CARE services. You will be instrumental in defining and implementing SRE best practices to maintain a highly available and resilient platform.
- Deutsche Bank is one of the few banks with the scale and network to compete aggressively in this space, and the breadth of investment in this area is unmatched by our peers. Joining the team is a unique opportunity to help ensure the operational excellence of a platform supporting some of our most mission-critical processing systems.
What we'll offer you
As part of our flexible scheme, here are just some of the benefits that you'll enjoy
- Best in class leave policy
- Gender neutral parental leaves
- 100% reimbursement under childcare assistance benefit (gender neutral)
- Sponsorship for Industry relevant certifications and education
- Employee Assistance Program for you and your family members
- Comprehensive Hospitalization Insurance for you and your dependents
- Accident and Term life Insurance
- Complementary Health screening for 35 yrs. and above
Your key responsibilities
As a CARE SRE, you will be crucial in ensuring the continuous operation and improvement of the platform. To be successful in this role, the below are key responsibility areas:
- Platform Reliability and Performance:Proactively monitor, troubleshoot, and resolve issues related to platform availability, performance, and capacity on both GCP and on-prem infrastructure.
- Operational Excellence:Develop, implement, and maintain SRE best practices, including incident response, post-mortems, root cause analysis, and proactive problem prevention.
- Automation and Tooling:Drive automation efforts to reduce manual toil across operational tasks, deployment, scaling, and recovery. This includes developing and improving monitoring, alerting, and self-healing systems.
- SLA/SLO Management:Define, monitor, and report on Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for key platform services, working to continuously improve them.
- Collaboration and Support:Liaise with application teams (tenants) to understand their operational needs, provide guidance on platform best practices for reliability, and assist with complex troubleshooting.
- Infrastructure as Code (IaC):Contribute to the development and maintenance of Infrastructure as Code for platform components to ensure consistency, repeatability, and disaster recovery capabilities.
- Capacity Planning:Work with platform leads to forecast capacity requirements and ensure the platform can scale to meet business demands.
- Security and Compliance:Collaborate with security teams to ensure the platform adheres to security policies and compliance requirements, focusing on operational security aspects.
- Documentation:Create and maintain comprehensive operational documentation, runbooks, and playbooks.
Your skills and experience
- Strong understanding of SRE principles and practices, including SLOs/SLIs, incident management, post-mortems, and toil reduction.
- Deep understanding of GCP servicessuch as GKE, IAM, identity services, CloudSQL, Cloud Monitoring, Cloud Logging, and related operational aspects.
- Extensive experience with Kubernetesand container orchestration, including configuration, troubleshooting, and performance tuning. Experience with Service Mesh (e.g., Istio) is highly desirable.
- Proficiency in Infrastructure as Code (IaC) tooling, particularly Terraform, for managing and automating infrastructure.
- Strong understanding of SDLC / DevOps best practices, with a focus on continuous integration, continuous delivery, and automated testing from an operational perspective.
- Experience with monitoring and alerting tools(e.g., Prometheus, Grafana, Splunk, Google Cloud Monitoring) and defining effective alerts and dashboards.
- Solid experience with Git and GitHub, including Git workflow, for managing code and infrastructure configurations.
- Hands-on experience with modern deployment toolingsuch as ArgoCD for GitOps-driven deployments and managing application lifecycles.
- Programming/scripting experience(e.g., Python, Go, Java, Bash) for automation, tooling development, and data analysis.
- Excellent problem-solving skillsand the ability to diagnose and resolve complex technical issues in distributed systems.
- Experience with production supportand on-call rotations in a critical environment.
How we'll support you
- Training and development to help you excel in your career
- Coaching and support from experts in your team
- A culture of continuous learning to aid progression
- A range of flexible benefits that you can tailor to suit your needs
About us and our teams
Please visit our company website for further information:
We strive for a in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively.
Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group.
We welcome applications from all people and promote a positive, fair and inclusive work environment.