Search by job, company or skills

HyperVerge

Senior Site Reliability Engineer

Save
  • Posted 3 days ago
  • Be among the first 20 applicants
Early Applicant

Job Description

Site Reliability Engineer II (SRE 2)

About the Role:

As a Site Reliability Engineer II, you will bridge the gap between development and operations

to ensure our cloud-native AWS ecosystem is scalable, highly available, and self-healing. You aren't just managing infrastructure; you are treating operational challenges as engineering problems. You will own production reliability, participate in on-call rotations, design CI/

CD pipelines, and leverage modern AI-driven automation to proactively prevent system degradation.

Key Responsibilities:

● Infrastructure as Code (IaC): Design, deploy, and maintain scalable

AWS environments using Terraform, CloudFormation, or Pulumi. Ensure zero-drift, no manual clicks infrastructure.

● Kubernetes Orchestration: Manage, scale, and optimize AWS EKS clusters, including

controllers, service meshes (e.g., Istio, Linkerd), and cluster autoscaling.

● Reliability Engineering & Incident Response: Lead incident mitigation, participate

infollow-the-sun on-call rotations, conduct blameless post-mortems, and champion high-availability practices.

● Observability: Build deep-visibility dashboards and proactive alerting topologies u

singPrometheus, Grafana, or Datadog to catch anomalies before they impact users CI/CD & Security: Own and optimize deployment pipelines (GitHub Actions, GitLab CI,

orJenkins) for zero-downtime releases.

Requirements:

  • 3 to 5 years of dedicated experience as an SRE with DevOps exposure
  • Highly comfortable in writing clean, production level code in at least one language.
  • SLIs,SLOs, Error Budgets, post- mortems, and reducing operational toil.
  • Scale & Complexity- Metrics indicating scale

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 149368031

Similar Jobs

Bengaluru, India

Skills:

JavaPrometheusGrafanaDatadogSqlSpringNosqlJenkinsGcpTerraformGitlabHelmAzureKubernetesAWSGKEAKSChaos Engineering toolsEKSLLM-based toolsMachine Learning techniques

Bengaluru, India

Skills:

NetworkingPrometheusGrafanaGcpMemory ManagementTerraformAnsibleLinux InternalsAzurePythonKubernetesAWSGKEFilesystemsGoAKSTerragruntEKSThanos

Bengaluru, India

Skills:

ApisPrometheusContainersKafkaFluxGrafanaDatadogTerraformSplunkHelmKubernetesLinux networking fundamentalsLokidistributed databasesOpenTelemetryArgoCD

Bengaluru

Skills:

PythonAwsUnix

Bengaluru, India

Skills:

RustGcpTerraformPythonKubernetesSecurity baselineGoFinOps mindsetReliabilityGPU workload understandingObservability