Search by job, company or skills

G

Senior Site Reliability Engineer III - Ansible/Terraform

6-8 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 days ago
  • Be among the first 20 applicants
Early Applicant
Quick Apply

Job Description

Responsibilities :

  • Define and enforce SLOs, SLIs, and error budgets across microservices
  • Architect an observability stack (metrics, logs, traces) and drive operational insights
  • Automate toil and manual ops with robust tooling and runbooks
  • Own incident response lifecycle: detection, triage, RCA, and postmortems
  • Collaborate with product teams to build fault-tolerant systems
  • Champion performance tuning, capacity planning, and scalability testing
  • Optimise costs while maintaining the reliability of cloud infrastructure

Must have Skills :

  • 6+ years in SRE/Infrastructure/Backend related roles using Cloud Native Technologies
  • 2+ years in SRE-specific capacity
  • Strong experience with monitoring/observability tools (Datadog, Prometheus, Grafana, ELK etc.)
  • Experience with infrastructure-as-code (Terraform/Ansible)
  • Proficiency in Kubernetes, service mesh (Istio/Linkerd), and container orchestration
  • Deep understanding of distributed systems, networking, and failure domains
  • Expertise in automation with Python, Bash, or Go
  • Proficient in incident management, SLAs/SLOs, and system tuning
  • Hands-on experience with GCP (preferred)/AWS/Azure and cloud cost optimisation
  • Participation in on-call rotations and running large-scale production systems

Nice to have skills :

  • Familiarity with chaos engineering practices and tools (Gremlin, Litmus)
  • Background in performance testing and load simulation (Gatling, Locust, k6, JMeter)

More Info

Job Type:
Industry:
Function:
Employment Type:
Open to candidates from:
Indian

About Company

At GreyOrange, we're shaping the future of warehouse orchestration and store inventory management through our flagship solutions: - GreyMatter: Our hyper-intelligent warehouse orchestration platform that optimizes automation, inventory and workforce management in real time - gStore: Intuitive store inventory management software ensuring optimal product availability while enhancing the worker and customer experiences What sets us apart is our ability to provide real-time visibility across all omnichannel nodes while seamlessly orchestrating robotic agents, people, inventory and systems. Our customers reduce their cost per unit, eliminate lost inventory, enhance worker productivity and safety, and elevate in-store experiences. As a vendor-agnostic solution compatible with diverse automation hardware through our Certified Ranger Network, we deliver rapid, significant results through our global Certified Partner Network of system integrators. Founded in 2012 | Headquartered in Atlanta with a global presence across the Americas, Europe and Asia

Job ID: 117932919