
Search by job, company or skills
Job Title: Site Reliability Engineer (SRE) AWS
Experience: 8+ years
Location: Chennai / Mumbai
Work Mode: Hybrid
Key Skills: AWS, Terraform, Kubernetes, Docker, Grafana, Prometheus, Datadog
Job Summary:
We are looking for a skilled Site Reliability Engineer (SRE) with strong AWS experience and a solid background in DevOps, automation, observability, and large-scale distributed systems.
Responsibilities:
Manage and optimize cloud infrastructure using AWS IaaS.
Implement SRE practices to enhance reliability, performance, and SDLC efficiency.
Build and maintain CI/CD pipelines (Jenkins, GitLab, Terraform).
Work with containers and orchestration (Docker, ECS, Kubernetes).
Troubleshoot performance, networking, and distributed system issues.
Drive DevOps and QA best practices across teams.
Implement observability: SLI/SLO, Error Budgets, monitoring, logging, tracing, alerting.
Lead incident resolution and perform RCA.
Automate tasks using Python/Bash/PowerShell.
Collaborate effectively with cross-functional teams with minimal supervision.
Qualifications:
Strong AWS cloud experience
Proven DevOps & SRE implementation skills
Good understanding of Linux, networking, and distributed systems
Hands-on experience with observability tools
Strong scripting and automation expertise
Excellent communication and teamwork skills
Job ID: 133342111