
Search by job, company or skills
Job Description: Site Reliability Engineering (SRE) Manager
Role Overview:
We are looking for an experienced SRE Manager to lead our Site Reliability Engineering team. The ideal candidate will have a strong background in DevOps practices, system reliability, and team leadership.
Key Responsibilities:
- Lead, mentor, and manage a team of SRE/DevOps engineers
- Define and implement SRE best practices (SLIs, SLOs, error budgets)
- Ensure system reliability, scalability, and performance
- Drive automation initiatives
- Collaborate with cross-functional teams
- Own CI/CD pipelines and release management
- Lead incident response and RCA processes
- Establish monitoring and observability frameworks
- Manage cloud infrastructure (AWS/Azure/GCP)
- Implement disaster recovery plans
Required Skills & Qualifications:
- 7+ years of experience in SRE/DevOps roles
- 3+ years of team management experience
- Experience with cloud platforms (AWS/Azure/GCP)
- Knowledge of CI/CD tools (Jenkins, GitLab CI)
- Experience with Docker and Kubernetes
- Scripting skills (Python, Bash)
- Knowledge of Terraform/CloudFormation
- Monitoring tools (Prometheus, Grafana, ELK)
Preferred Qualifications:
- Experience with microservices
- Cloud certifications are a plus
- Strong problem-solving skills
Key Competencies:
- Leadership
- Communication
- Ownership
- Stakeholder management
Good to Have:
- Experience in e-commerce platforms
- Knowledge of chaos engineering
Job ID: 147206893
Skills:
Devops, Cloud Infrastructure, AWS, Gcp, Python, Site Reliability Engineering
Skills:
Elk, Cloudformation, Prometheus, Bash, Grafana, Jenkins, Gcp, Docker, Terraform, Azure, Kubernetes, Python, AWS, GitLab CI

Skills:
Rust, Prometheus, CDK, Pulumi, Grafana, Datadog, New Relic, Devops, Typescript, Javascript, Gcp, Terraform, Azure, Python, Kubernetes, AWS, Groundcover, Zipkin, GitOps, SRE, Go, Jaeger, OpenTelemetry
Skills:
containerization , configuration management, Monitoring Tools, Scripting, AWS, Linux/Unix administration
Skills:
Unix Administration, Cassandra, PostgreSQL, Bash, Devops, Jenkins, Gcp, Linux, Docker, ECS, MongoDB, Puppet, Kubernetes, Python, AWS, NoSQL databases, Chef, EKS, basic networking concepts, CI CD pipelines, Site Reliability Engineering
We don’t charge any money for job offers