Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles.
Our SRE culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Equifax brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big, and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn, grow and take pride in our work.
What You'll Do
- Define, maintain, and report on SLA, SLO, and SLIs in partnership with product and architecture teams.
- Provide deep troubleshooting leadership; lead the resolution of production issues under pressure and perform Root Cause Analysis (RCA).
- Lead with a data-driven mindset, focusing on optimizing GKE/Cloud environments and maintaining QE, DevSec, and FinOps KPIs.
- Drive Infrastructure-as-Code (IaC) strategy using Terraform and Helm to ensure scalable, repeatable deployments.
- Collaborate on implementation architecture decisions, focusing on refactoring and EOSL (End of Service Life) planning.
- Coach junior engineers on reliability best practices and secure software development guidelines.
What Experience You Need
- Bachelor's Degree in Computer Science or equivalent.
- 7+ years of Software Engineering experience, with at least 3-4 years focused on Site Reliability or DevOps.
- Expertise in Cloud: Proven experience with GCP (preferred) or AWS, specifically with Managed Kubernetes (GKE/EKS).
- Coding Proficiency: Strong ability to read/debug Java and SpringBoot; proficiency in scripting (Python or Go) for automation.
- CI/CD & IaC: Hands-on experience with Jenkins pipelines, Terraform, and Helm Charts.
What Could Set You Apart
- Data Ops: Experience with Big Data tools (Dataflow/Beam, BigQuery, PubSub).
- Observability: Experience with Prometheus, Grafana, or Google Cloud Operations suite.
- Regulation: Experience working in highly regulated financial environments.