
Search by job, company or skills

Site Reliability Engineer II (SRE 2)
About the Role:
As a Site Reliability Engineer II, you will bridge the gap between development and operations
to ensure our cloud-native AWS ecosystem is scalable, highly available, and self-healing. You aren't just managing infrastructure; you are treating operational challenges as engineering problems. You will own production reliability, participate in on-call rotations, design CI/
CD pipelines, and leverage modern AI-driven automation to proactively prevent system degradation.
Key Responsibilities:
● Infrastructure as Code (IaC): Design, deploy, and maintain scalable
AWS environments using Terraform, CloudFormation, or Pulumi. Ensure zero-drift, no manual clicks infrastructure.
● Kubernetes Orchestration: Manage, scale, and optimize AWS EKS clusters, including
controllers, service meshes (e.g., Istio, Linkerd), and cluster autoscaling.
● Reliability Engineering & Incident Response: Lead incident mitigation, participate
infollow-the-sun on-call rotations, conduct blameless post-mortems, and champion high-availability practices.
● Observability: Build deep-visibility dashboards and proactive alerting topologies u
singPrometheus, Grafana, or Datadog to catch anomalies before they impact users CI/CD & Security: Own and optimize deployment pipelines (GitHub Actions, GitLab CI,
orJenkins) for zero-downtime releases.
Requirements:
Job ID: 149368031
Skills:
Java, Prometheus, Grafana, Datadog, Sql, Spring, Nosql, Jenkins, Gcp, Terraform, Gitlab, Helm, Azure, Kubernetes, AWS, GKE, AKS, Chaos Engineering tools, EKS, LLM-based tools, Machine Learning techniques
Skills:
Networking, Prometheus, Grafana, Gcp, Memory Management, Terraform, Ansible, Linux Internals, Azure, Python, Kubernetes, AWS, GKE, Filesystems, Go, AKS, Terragrunt, EKS, Thanos
Skills:
Apis, Prometheus, Containers, Kafka, Flux, Grafana, Datadog, Terraform, Splunk, Helm, Kubernetes, Linux networking fundamentals, Loki, distributed databases, OpenTelemetry, ArgoCD
Skills:
Python, Aws, Unix
Skills:
Rust, Gcp, Terraform, Python, Kubernetes, Security baseline, Go, FinOps mindset, Reliability, GPU workload understanding, Observability
We don’t charge any money for job offers