Search by job, company or skills

Okta

Staff Site Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 17 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Get to know Okta

Okta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.

At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we're looking for lifelong learners and people who can make us better with their unique experiences.

Join our team! We're building a world where Identity belongs to you.

What You'll Be Doing

  • Design, build, and operate highly scalable, reliable, and secure infrastructure powering our production systems across AWS and GCP.
  • Lead major reliability and modernization initiatives, including container platform migrations (e.g., ECS to EKS/GKE) and microservice enablement across multi-cloud environments.
  • Serve as a technical authority in Kubernetes (EKS and GKE), cloud infrastructure (AWS and GCP), and modern CI/CD practices (GitOps, automation pipelines).
  • Partner with development teams to architect and enable microservice-based applications, ensuring production readiness, scalability, and observability.
  • Implement and manage infrastructure as code (Terraform, Ansible) to automate provisioning, scaling, and configuration management across multiple cloud providers.
  • Drive improvements in observability, performance, and cost efficiency through robust monitoring, logging, and alerting systems that span AWS and GCP.
  • Champion SRE best practices defining SLOs/SLIs, conducting blameless postmortems, and continuously improving incident response.
  • Lead complex technical projects from conception to completion, managing timelines, and technical dependencies across teams.
  • Mentor engineers across teams, fostering a culture of reliability, automation, and continuous learning.
  • Collaborate with security and compliance partners to ensure infrastructure adheres to best practices and standards (e.g., IAM Federation, Workload Identity).
  • Participate in the on-call rotation, using incidents as learning opportunities to enhance systems and processes.

What You'll Bring to the Role:

  • Strong hands-on experience architecting and operating cloud-native distributed systems (AWS and GCP).
  • Deep expertise with Kubernetes (EKS and GKE) design, provisioning, scaling, and advanced troubleshooting in production.
  • Proven experience leading ECS to EKS/GKE migrations and driving microservice enablement initiatives at scale.
  • Proficiency with Infrastructure as Code tools such as Terraform (multi-provider), Ansible, or CloudFormation.
  • Solid coding and scripting ability in Python, Go, or Shell, with a focus on automation, tooling, and operational excellence.
  • Advanced understanding of CI/CD pipelines (ArgoCD, GitLab CI, Spinnaker), Linux systems, and networking fundamentals (Direct Connect/Interconnect, DNS, routing, load balancing).
  • Experience managing databases and caching systems (e.g., RDS/Cloud SQL, Redis/Memorystore, PostgreSQL, MySQL) in cloud environments.
  • Hands-on experience with observability tools (Prometheus, Grafana, ELK, Loki, OpenTelemetry, Google Cloud Operations) for performance and reliability insights.
  • Working knowledge of container security, secrets management (HashiCorp Vault, AWS Secrets Manager, Google Secret Manager), and compliance in production environments.
  • Strong communication and problem-solving skills, with demonstrated success leading cross-team projects and mentoring peers.

Experience:

  • 8+ years in SRE, DevOps, or Infrastructure Engineering roles.
  • 35 years of experience with Kubernetes (EKS/GKE) and related ecosystem tools (Helm, Karpenter, etc.) in production.
  • 35 years of experience with AWS and GCP.
  • 35 years using Terraform to manage multi-cloud infrastructure.
  • 5+ years of coding experience in Python, Go, or similar languages.
  • Proven track record leading high-impact projects, specifically migration projects (ECS EKS/GKE) and enabling microservice architectures.
  • Experience implementing SLOs/SLIs, performing root cause analyses, and improving operational resilience.
  • Prior work in SaaS or high-scale, cloud-native environments is a strong plus.
  • Strong Linux and security fundamentals.
  • Bachelor's degree in Computer Science or equivalent hands-on experience.

What you can look forward to as a Full-Time Okta employee!

  • Amazing Benefits
  • Making Social Impact
  • Developing Talent and Fostering Connection + Community at Okta

Okta cultivates a dynamic work environment, providing the best tools, technology and benefits to empower our employees to work productively in a setting that best and uniquely suits their needs. Each organization is unique in the degree of flexibility and mobility in which they work so that all employees are enabled to be their most creative and successful versions of themselves, regardless of where they live. Find your place at Okta today! https://www.okta.com/company/careers/.

Some roles may require travel to one of our office locations for in-person onboarding.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation.

Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice.

Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Personnel and Job Candidate Privacy Notice at https://www.okta.com/legal/personnel-policy/.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 143769341

Similar Jobs