Search by job, company or skills

NatWest Group

Site Reliability Engineer (AWS & Kubernetes)

Save
  • Posted 22 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Join us as a Site Reliability Engineer

  • In this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services
  • You'll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to deliver change in a safe and secure way
  • This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development
  • We're offering this role at associate level

What you'll do

As our Site Reliability Engineer, As our Site Reliability Engineer, you'll contribute to the reliability, monitoring and operational excellence of cloud-native platforms.

You'll work closely with senior engineers to support production systems, implement SRE practices, and ensure services are observable, scalable and resilient. You'll also participate in the 24/7 support and on-call rotation, gaining experience in incident response and platform operations.

You'll Also Be

  • Supporting the operation of AWS-based Kubernetes platforms (EKS)
  • Contributing to monitoring, alerting and observability implementations using tools like Grafana and Prometheus
  • Assisting in incident management, troubleshooting and root cause analysis
  • Participating in on-call rotations and production support activities
  • Implementing infrastructure changes using Terraform and GitOps workflows
  • Supporting CI/CD pipelines (GitLab, Argo CD) and deployment processes
  • Helping improve system reliability through automation and operational improvements
  • Following SRE practices such as runbooks, documentation and post-incident reviews
  • Working with DevOps and engineering teams to improve system performance and stability
  • Ensuring solutions align with security, compliance and operational standards

The skills you'll need

We're looking for an engineer with solid foundational experience in cloud platforms and a keen interest in reliability engineering and production operations.

You'll Also Need

  • Experience working with AWS and Kubernetes (EKS) in a production or pre-production environment
  • Familiarity with monitoring and observability tools such as Grafana and Prometheus
  • Understanding of CI/CD pipelines and Git-based workflows (GitLab preferred)
  • Exposure to Terraform or infrastructure-as-code concepts
  • Basic understanding of SRE practices and production support models
  • Experience troubleshooting applications or infrastructure issues
  • Awareness of networking and security fundamentals in cloud environments
  • Willingness to participate in on-call rotations and incident response
  • Strong problem-solving mindset and eagerness to learn
  • Good communication and collaboration skills

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148903693

Similar Jobs

Gurugram, Gurugram, India

Skills:

PrometheusGrafanaTerraformKubernetesAWSerror budgetsSLIsKarpenterGitOpsLokiKubernetes networkingEKSArgo CDCiliumTempotoil reductionSLOsSRE principles