Search by job, company or skills

L

Site Reliability Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 19 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description

We are looking for a Senior Site Reliability Engineer (SRE) with deep expertise in observability, cloud-native infrastructure, and large-scale distributed systems. This role is highly hands-on and focuses on designing, building, and operating reliable, observable, and scalable platforms running on Kubernetes, with a strong preference for Google Cloud Platform (GCP) and AWS.

Roles & Responsibilities

Reliability & Operations

- Design, implement, and maintain highly available and resilient systems in Kubernetes-based environments

- Define and enforce SLOs, SLIs, and error budgets

- Lead incident response, RCA, and postmortems

- Drive reliability improvements through automation

Observability (Core Focus)

- Architect and operate observability platforms for metrics, logging, tracing, and alerting

- Work with Prometheus, Alertmanager, OpenTelemetry, Grafana, Loki / ELK / OpenSearch

- Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)

- Establish actionable alerting standards

Cloud & Platform Engineering

- Build and manage infrastructure on GCP (preferred) or AWS

- Operate Kubernetes clusters (GKE preferred)

- Deploy services using Helm

- Manage containerized workloads using Docker

Automation & Tooling

- Strong Python skills with emphasis on reliability, automation, and observability tooling

- Develop automation and tooling using Python

- Create internal reliability and monitoring tools

- Integrate CI/CD pipelines with observability and reliability checks

Collaboration & Leadership

- Mentor junior engineers

- Influence architecture decisions

- Collaborate across engineering teams

More Info

Job Type:
Industry:
Employment Type:

Job ID: 147202815

Similar Jobs

Hyderabad, India

Skills:

KubernetesGithubJiraGrafanaAWSPrometheusBashPythonDockerTerraformConfluenceHelmJenkinsGitGitHub ActionsLokiGoGitOpsCircleCIInfrastructure as CodePagerDutyCI CD systems

Hyderabad, India

Skills:

RedHatGolangPerforcePrometheusDatadogSvnDockerTerraformGitlabPythonAWSCloudformationUbuntuJenkinsCloudwatchGcpLinuxAnsibleECSCentosKubernetesAlertManagerDeployment ManagerRancherThanosGKEAmazon LinuxEKS

Hyderabad, India

Skills:

Performance TestingMicroservicesJenkinsTerraformDockerAutomation FrameworksHelmKubernetesAzure DevOpsobservability frameworksIaCCI CDGitHub Actionschaos engineering

Hyderabad, India

Skills:

DockerKubernetesSRE practicesDevOps practices

Hyderabad, Bengaluru, Chennai

Skills:

CliProgrammingGrafanaPuppet