Search by job, company or skills

HyperVerge

Site Reliability Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 10 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role Overview

We are looking for an SRE who doesn't just maintain systems but builds them. You won't be stuck in a traditional support loop; instead, you will focus on the reliability, scalability, and automation of our cloud-native ecosystem. The ideal candidate has a developer first mindset, using code to solve infrastructure bottlenecks and ensuring our AWS and Kubernetes environments are rock-solid.

Key Responsibilities

  • Infrastructure as Code (IaC): Design and deploy scalable AWS environments using Terraform, CloudFormation, or Pulumi. No manual clicks.
  • Kubernetes Orchestration: Manage and optimize EKS (Elastic Kubernetes Service) clusters, including ingress controllers, service meshes, and autoscaling.
  • Reliability Engineering: Implement Self-healing infrastructure by writing automation scripts in Python or Go.
  • Observability: Build deep-visibility dashboards and alerting systems using Prometheus, Grafana, or Datadog to proactively catch issues before they hit users.
  • CI/CD Mastery: Own and optimize deployment pipelines (Jenkins, GitLab CI, or GitHub Actions) to ensure zero-downtime releases.
  • Security & Compliance: Ensure the infrastructure follows the Principle of Least Privilege using AWS IAM and network security best practices.

Technical Requirements

  • AWS Expertise: 3+ years of hands-on experience with core services (EC2, S3, RDS, Lambda, VPC, IAM).
  • Containerization: Strong experience with Docker and production-grade Kubernetes management.
  • Scripting/Coding: Proficiency in Python or Go for building internal tools and automating repetitive tasks.
  • Linux Internals: Strong command of Linux/Unix administration, networking (TCP/IP, DNS), and troubleshooting.
  • Configuration Management: Experience with Ansible, Chef, or Puppet to maintain system state.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147252767

Similar Jobs

Bengaluru, India

Skills:

UnixElkPrometheusGrafanaDatadogDockerTerraformPythonAWSJavaCloudformationBashPulumiDevopsGcpLinuxArmAzureKubernetesMonitoring observability toolsInfrastructure as CodeSREGoAzure Monitor

Bengaluru, India

Skills:

PrometheusKafkaGrafanaRabbitmqLinuxTerraformDockerAnsibleHelmKubernetesEvent HubVulnerability management toolsSecurity best practices

Bengaluru, India

Skills:

ElkPrometheusSlasNetworkingDnsGrafanaCdnGraylogPythonAWSPerformance TuningBashDevopsHigh AvailabilityGcpLoad BalancingAzureKubernetesSLIsGoDisaster Recoveryobservability toolsSecurityOpenTelemetryInfrastructure EngineeringSite Reliability Engineeringlog management toolsreliability metricsSLOscontainer orchestrationincident management frameworks

Bengaluru, India

Skills:

ElkMariadbPrometheusBashGrafanaDatadogLinux AdministrationSqlGcpDockerHelmKubernetesPythonAWSCQLScyllaDB

Bengaluru, India

Skills:

GithubPowerShellPrometheusBashGrafanaJenkinsGitCloudwatchLinuxBitbucketTerraformAWS CloudFormationKubernetesPythonAWSLoki