Search by job, company or skills

Snapmint

Senior Site Reliability Engineer

Save
  • Posted 3 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Senior Site Reliability Engineer (SRE)

Summary

We are looking for a Senior Site Reliability Engineer (SRE) to build and operate scalable, reliable, and secure platform infrastructure. The ideal candidate will drive automation, observability, incident management, and cloud-native best practices to improve system reliability and operational excellence across distributed systems.

Roles & Responsibilities

  • Define and manage SLIs, SLOs, and error budgets for critical services
  • Design and enhance monitoring, logging, alerting, and tracing capabilities
  • Automate operational processes and improve platform efficiency
  • Participate in incident response, root cause analysis (RCA), and postmortem reviews
  • Support production environments through on-call rotations and reliability initiatives
  • Improve system performance, scalability, availability, and capacity planning
  • Collaborate with engineering teams to enhance application resiliency and operational readiness
  • Drive adoption of Infrastructure as Code (IaC) and CI/CD best practices
  • Maintain highly available, fault-tolerant, and secure cloud infrastructure

Skills

  • Strong Linux/Unix administration and Debugging skills
  • Proficiency in Python/Bash/Shell scripting and automation
  • Expertise in observability and monitoring tools such as Grafana, Prometheus, ELK, and New Relic
  • Strong expertise in AWS and cloud infrastructure management
  • Strong experience with log analysis and monitoring using ELK
  • Strong incident management, communication, and operational excellence mindset
  • Hands-on experience with Kubernetes, Docker, and container orchestration
  • Experience with Terraform and Infrastructure as Code practices
  • Strong understanding of networking, DNS, load balancing, and distributed systems
  • Experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI, or ArgoCD

Qualifications

  • B.tech/B.E. Equivalent
  • 4+ years of experience in SRE, DevOps, Platform Engineering, or Systems Engineering

Good to Have

  • Bachelor's degree in Computer Science, Engineering, or a related field
  • Cloud or Kubernetes certifications
  • Experience managing production incidents in high-availability environments
  • Exposure to multi-cloud architectures (AWS/GCP/Azure)

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 149320387

Similar Jobs

Delhi, Kolkata, Mumbai

Skills:

AgileSoftware Development Life CycleJavascriptSplunkAutomationJIRAPythonProduct managementOperationsMonitoring

Gurugram, India

Skills:

Distributed SystemsNetworkingPrometheusBashGrafanaTerraformLinuxAzurePythonKubernetesAWSInfrastructure as CodeGo

Delhi, India

Skills:

Distributed SystemsNetworkingPrometheusBashGrafanaLinuxTerraformAzureKubernetesPythonAWSInfrastructure as CodeGo

Noida

Skills:

KubernetesDockerTerraformCloud ServicesCI/CDMonitoring

Noida, India

Skills:

RustGcpTerraformPythonKubernetesSecurity baselineGoFinOps mindsetReliabilityGPU workload understandingObservability