
Search by job, company or skills
Title:- Site Reliability Architect/Lead
Location: Hyderabad
looking for early joiners
Experience: 12-16 Years
Responsibilities: Site Reliability Architect/Lead will be responsible for implementing and operationalizing SRE practices across production systems, including defining and enforcing SLIs, SLOs, and error budgets. The role involves active participation in system and architecture-level design decisions to ensure high availability, scalability, resilience, and performance. The individual will own observability standards, including hands-on dashboard creation, alert design, and continuous tuning to reduce false alerting. They will lead infrastructure and application deployments, ensure reliable CI/CD pipelines, drive automation to eliminate operational toil, manage incident responses and RCAs, act as an escalation point during critical outages, and mentor SREs while promoting a reliability-first engineering culture.
Skill Stack: Strong hands-on experience in observability and monitoring tools such as Prometheus, Grafana, Datadog, Dynatrace, New Relic, or ELK; infrastructure and application deployment using Kubernetes and cloud platforms (AWS, Azure, or GCP); CI/CD and GitOps tools such as Helm, Argo CD, Flux, Jenkins, GitHub Actions, or GitLab CI; Infrastructure as Code using Terraform, CloudFormation, or ARM; and SRE automation using scripting languages such as Python, Go, or Bash/Shell. Proven experience working with distributed systems, microservices, and large-scale production environments is required.
Job ID: 145565891