Site Reliability Engineer

5-10 Years

Save

Early Applicant

Quick Apply

Job Description

Engage with our product teams to understand requirements, design, and implement resilient and scalable infrastructure solutions
Operate, monitor, and triage all aspects of our production and non-production environments
Collaborate with other engineers on code, infrastructure, design reviews, and process enhancementsEvaluate and integrate new technologies to improve system reliability, security, and performance
Develop and implement automation to provision, configure, deploy, and monitor Apple services
Participate in an on-call rotation providing hands-on technical expertise during service-impacting events
Contribute to capacity planning, scale testing, and disaster recovery exercisesApproach operational problems with a software engineering mindset

BS degree in computer science or equivalent field with 5+ years of experience
5+ years in an Infrastructure Ops, Site Reliability Engineering, or DevOps-focused role.
Knowledge of Linux operating system principles, networking fundamentals, and systems management.
Demonstrable fluency in at least one of the following languages: Java, Python, or Go
Experience managing and scaling distributed systems in a public, private, or hybrid cloud environment

Preferred Qualifications

Familiarity with micro-services architecture and container orchestration with Kubernetes.
Awareness of key security principles including encryption, keys (types and exchange protocols).
Understanding SRE principles includes monitoring, alerting, error budgets, fault analysis, and automation.
Strong sense of ownership, with a desire to communicate and collaborate with other engineers and teams.
Ability to identify and communicate technical and architectural problems, while working with partners and their team to iteratively find solutions.