
Search by job, company or skills
We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving automation, monitoring, and incident response strategies. This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.
Job Description
Reliability & Performance:
Incident Management & Response:
Automation & Tooling:
Collaboration:
Leadership & Team Building:
Drive performance reviews, skills development, and career progression for team members
Job ID: 147470907
Skills:
CI CD - GitHub, S3, Orchestration, RDS, Aws Services, Cloud networking, Configuration management, Prometheus, Cloudwatch, Lambda, Efs, Terraform, Dynatrace, Kubernetes, Scripting – Python, Cloud migrations, Version control – GIT, Application and Infrastructure Delivery automation, EKS, EBS, Storage Solutions
Skills:
Cloudwatch, Docker, Datadog, Kubernetes, Python, AWS, AI productivity tools, Go, incident.io
Skills:
Terraform, Incident Response, Ansible, Helm, Kubernetes, AWS, Linux systems administration
Skills:
training material , Azure, Blameless Post Mortems, Infrastructure as Code, Root Cause Analysis, Run Books, Java Applications
Skills:
Windows server, Saas, Openshift, Kdb, Grafana, Mssql, Itrs, New Relic, Geneos, Gcp, Terraform, Ansible, Netcool, Distributed Systems, Oracle, Kubernetes, Error budgets, Unix servers, Incident governance, OpenTelemetry, Telemetry pipelines, SLOs, Observability tools
We don’t charge any money for job offers