
Search by job, company or skills
Who We Are:
For enterprises struggling to secure cloud workloads, Aviatrix offers a single solution for pervasive cloud security. Where current cybersecurity approaches focus on securing entry points to a trusted space, Aviatrix Cloud Native Security Fabric (CNSF) delivers runtime security and enforcement within the cloud application infrastructure itself closing gaps between existing solutions and helping organizations regain visibility and control. Aviatrix ensures security, cloud, and networking teams are empowering developer velocity, AI, serverless, and what's next. For more information, visit www.aviatrix.com.
About the Rol
eAs a Staff Engineer - Site Reliability Engineering, you'll work independently taking ownership of significant components and systems. You'll mentor junior contributors while driving technical excellence and reliability improvements across the platform
.
Responsibiliti
es Kubernetes: Design and implement complex application lifecycle management, custom operators, and advanced troubleshooti
ng Infrastructure as Code: Architect comprehensive IaC solutions with advanced configurations and module developme
nt Automation & Development: Design and build sophisticated automation frameworks and tools in Golang and Pyth
on Component Ownership: Take full ownership of significant system components with responsibility for their reliability and performance; define SLA targets and drive achieveme
nt Architecture & Design: Design components and systems with moderate complexity, evaluating tradeoffs and writing comprehensive design documen
ts Reliability Engineering: Contribute improvements that significantly improve product security, quality, ease of operation, reliability, and performance. Own reliability for major platform componen
ts Automation Excellence: Implement automation frameworks that scale across teams; eliminate entire classes of manual wo
rk Observability: Design comprehensive observability strategies; implement advanced monitoring and distributed traci
ng Incident Management: Lead major incident response; establish incident management processes and runboo
ks Performance Engineering: Drive system-wide performance improvements; establish performance engineering practic
es Collaboration: Lead technical discussions with product engineering; influence architecture decisions across tea
ms Technical Leadership: Mentor junior engineers and provide technical guidance on complex problems; drive improvements to engineering processes and operational procedur
es
Qualificati
ons Experience: 6+ years with BS in designated Engineering or related fi
eld Advanced Technical Skills: Expert-level proficiency in Golang and Python with system design experie
nce Cloud Architecture: Extensive experience designing and implementing cloud-native soluti
ons Infrastructure as Code: Deep understanding of Terragrunt/Terraform, including advanced features and best practi
ces Kubernetes: Advanced Kubernetes knowledge including operators and H
elm Observability: Expert knowledge of monitoring, logging, and observability soluti
ons Leadership: Demonstrated ability to mentor others and lead technical initiati
ves Communication: Excellent written and verbal communication skills for design documentation and presentati
onsJob ID: 142409649