Search by job, company or skills

NatWest Group

Site Reliability Engineer

Save
new job description bg glownew job description bg glow
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Join us as a Site Reliability Engineer

  • In this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services
  • You'll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to deliver change in a safe and secure way
  • This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development
  • We're offering this role at senior analyst

What you'll do

As a Site Reliability Engineer, you'll be supporting colleagues and feature team members to meet defined service level objectives and continually improve systems and environments. You'll also be proactively contributing new ideas and innovations to meet short term and longer term goals while balancing and managing risk.

You'll also be accountable for the day-to-day health of both production and non-production environments, including responding to incidents.

A Typical Day Will Involve

  • Ensuring service availability
  • Proactively monitoring the production environment
  • Completing root cause analysis of issues

The skills you'll need

We're looking for someone with at least four years of experience in incident, problem and change management experience, paired with production support experience. You'll need a Cloud environment skillset, as well as experience of monitoring, and Splunk or DX-APM dash-board creation.

Additionally, You'll Need Experience Of

  • Working with AWS services such as EC2, EKS or ECS, Lambda, RDS, S3, Python automation, FastAPI, and MongoDB.
  • Managing highly available production systems, participate in on-call rotations, troubleshoot incidents, and perform root cause analysis.
  • Building automated operational workflows, support CI/CD pipelines, monitoring, alerting, and incident management.
  • Working closely with engineering teams to improve system reliability and performance; exposure to Kubernetes, Terraform, observability tools, and AI-driven automation is a plus

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147481581

Similar Jobs

Chennai, India

Skills:

PrometheusElk StackBashGrafanaRedisRabbitmqGcpLinuxTerraformMySQLAnsibleApache KafkaMongoDBOracleAzurePythonKubernetesAWS

Chennai, India

Skills:

BashGrafanaGitDockerTerraformKubernetesPythonAWSInfrastructure as CodeGitOpsCI CD systemsGoObservability tools

Chennai, India

Skills:

OpenshiftPrometheusBashGrafanaHelmKubernetesPythonGeneos ITRS

Chennai, India

Skills:

NginxTomcatDatadogElasticsearchJavascriptDockerTerraformRubyAWSNodejsRedisUNIXNew RelicJenkinsRabbitmqGcpHaproxyAnsibleMongoDBNagiosAzureKubernetesPackergraphite

Chennai, India

Skills:

system integrationIncident ManagementautomationCapacity Planningmonitoring frameworksfault-tolerant architecturesscalable infrastructure solutionsdisaster recovery strategiesOperational Excellencesystem reliability