3+ years of experience operating & owning end-to-end availability and performance of a large scale production environment.
Excellent understanding of cloud environments and technologies, especially AWS.
Solid understanding of containerization and microservices architecture.
Hands-on experience and understanding of the Kubernetes ecosystem and best practices with K8s.
Ability to dissect complex problems into simple sub-problems and use available solutions to resolve them.
Excellent understanding of cloud environments and technologies, especially AWS, Azure, GCP.
Hand-on experience in Python/Go
Understanding of auto-remediation, alert correlation frameworks.
Understanding of SLO/SLIs, error budgeting, KPIs for highly critical services.
Experience with logging & metrics platform at scale.
Proven strengths in identifying, mitigating, and root-causing issues while continuously seeking ways to drive optimization, efficiency, and the bottom line.