Search by job, company or skills

SolarWinds

Senior Manager, Site Reliability Engineering (SRE)

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted a day ago

Job Description

At SolarWinds, we're a people-first company. Our purpose is to enrich the lives of the people we serveincluding our employees, customers, shareholders, partners, and communities. Join us in our mission to help customers accelerate business transformation with simple, powerful, and secure solutions.

The ideal candidate thrives in an innovative, fast-paced environment and is collaborative, accountable, ready, and empathetic. We're looking for individuals who believe they can accomplish more as a team and create lasting growth for themselves and others. We hire based on attitude, competency, and commitment. Solarians are ready to advance our world-class solutions in a fast-paced environment and accept the challenge to lead with purpose. If you're looking to build your career with an exceptional team, you've come to the right place. Join SolarWinds and grow with us!

Role Overview:

SolarWinds is looking for a Senior Manager, Site Reliability Engineering (SRE) to lead reliability, scalability, and operational excellence for large-scale, cloud-native, data-intensive SaaS platforms.

This role combines people leadership, technical depth, and operational ownership. You will manage and grow SRE teams responsible for production systems, while remaining close to architecture, platform reliability, incident response, and automation strategy. The ideal candidate has operated complex distributed systems at scale and knows how to balance availability, performance, velocity, and cost.

Responsibilities:
  • Lead and mentor SRE teams responsible for the reliability, availability, and performance of mission-critical SaaS platforms
  • Own and drive production reliability outcomes, including uptime, latency, capacity, scalability, and operational readiness
  • Oversee data-intensive distributed systems, including technologies such as ClickHouse, Kafka, ZooKeeper, MySQL, Redis, and Flink
  • Guide and review Kubernetes platform operations at scale, including cluster lifecycle management, upgrades, and capacity planning
  • Establish and evolve SRE best practices, including SLIs/SLOs, alerting strategy, incident management, and post-incident reviews
  • Promote and enforce an automation-first approach, reducing manual toil through scripting, tooling, and platform improvements
  • Partner closely with Engineering, Platform, Product, and Security teams to embed reliability into system design and delivery
  • Drive adoption of GitOps, service mesh, and observability standards across teams
  • Lead cloud infrastructure operations across AWS and Azure, ensuring secure, resilient, and cost-effective usage
  • Participate in and oversee on-call and incident response practices, ensuring clear ownership, fast recovery, and continuous improvement
Must Have Qualifications:
  • Proven experience leading SRE, Platform, or Infrastructure teams supporting production, customer-facing SaaS systems
  • Strong hands-on Kubernetes experience in large-scale production environments, including:
  • Cluster operations and lifecycle management
  • Autoscaling and resilience mechanisms (HPA, VPA, KEDA, Cluster Autoscaler, Pod Disruption Budgets, Goldilocks)
  • Observability and monitoring (Prometheus, Grafana)
  • Experience operating distributed, data-intensive systems such as ClickHouse, Kafka, ZooKeeper, MySQL, Redis, or Flink
  • Practical experience with GitOps and service mesh technologies, including Flux, Kustomize, and Istio
  • Strong automation mindset, with hands-on experience using Python and/or Go to improve reliability and reduce operational overhead
  • Extensive experience working with AWS and Azure managed services, including EKS/AKS, Aurora, ElastiCache, storage services, load balancers, VPC, and KMS
  • Demonstrated ownership of incident management, root cause analysis, and long-term remediation
  • Ability to communicate clearly and collaborate effectively with engineering leadership and cross-functional teams

SolarWinds is an Equal Employment Opportunity Employer. SolarWinds will consider all qualified applicants for employment without regard to race, color, religion, sex, age, national origin, sexual orientation, gender identity, marital status, disability, veteran status or any other characteristic protected by law.

All applications are treated in accordance with the SolarWinds Privacy Notice: https://www.solarwinds.com/applicant-privacy-notice

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 143306069