About this role:
- Wells Fargo is seeking a Lead Systems Operations Engineer
- Lead Site Reliability Engineer (App SRE) is responsible for driving reliability, automation, observability, and performance for missioncritical applications and platforms.
- This role blends software engineering excellence with operational expertise to deliver stable, scalable, and resilient services, while reducing toil and shifting operations left across the application lifecycle.
- The Lead SRE acts as a technical authority and mentor, partnering with application, platform, and DevOps teams to embed reliability into design, delivery, and operations.
In this role, you will:
- Lead complex, broad impact initiatives including provision of high-level systems consultation for the technology teams
- Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area
- Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems
- Make decisions on technical changes and enhancements
- Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives
- Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
Required Qualifications:
- 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
Job Expectations:
- Partner with application, platform, and business stakeholders to define, implement, and govern SLIs, SLOs, and error budgets, balancing reliability with delivery velocity.
- Lead the design and continuous improvement of observability, telemetry, monitoring, and alerting, ensuring actionable insights and reduced alert fatigue.
- Identify, prioritize, and implement automation and selfhealing solutions to eliminate operational toil and improve service resilience.
- Own and lead production readiness and golive activities, including NFR validation, Permit to Operate (PTO), and operational risk assessments.
- Provide engineeringled application production support, acting as an escalation point for complex application and platform issues.
- Lead and troubleshoot major incident response (P1/P2/P3), drive indepth root cause analysis (RCA), and ensure preventative actions are implemented to achieve longterm stability.
- Influence and guide teams to shift reliability left by embedding SRE practices into design, CI/CD pipelines, and release processes.
- Mentor junior engineers and contribute to SRE standards, best practices, and operating models.
- Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
Additional Required Qualifications:
- 5+ years of handson experience in production application support engineering, with a strong focus on reliability, availability, and operational excellence.
- 3+ years of experience leading and operating production systems in a Site Reliability Engineering, DevOps, or Reliability Engineering role.
- 3+ years of experience working with enterprise schedulers and databases, such as Autosys, Oracle, and MS SQL Server.
- 3+ years of experience supporting applications on Kubernetes / OpenShift platforms.
- Strong understanding of observability concepts (metrics, logs, traces, APM) using tools such as AppDynamics, ThousandEyes, Prometheus, Grafana, Splunk, and Aternity.
- Solid experience with webbased applications and application servers.
- Proven experience providing technical leadership and handson execution in complex enterprise environments.
- Excellent communication and documentation skills, with the ability to influence both technical and nontechnical stakeholders.
Desired Qualifications:
- Strong scripting and automation skills using Unix/Linux shell, Python, or Ansible.
- Experience with cloud and container platforms, including Red Hat OpenShift (OCP) and modern cloud architecture concepts.
- Experience working with COTS platforms in regulated or largescale enterprise environments.
Posting End Date:
31 Mar 2026
We Value Equal Opportunity
Wells Fargo is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic.
Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit's risk appetite and all risk and compliance program requirements.
Candidates applying to job openings posted in Canada: Applications for employment are encouraged from all qualified candidates, including women, persons with disabilities, aboriginal peoples and visible minorities. Accommodation for applicants with disabilities is available upon request in connection with the recruitment process.
Applicants with Disabilities
To request a medical accommodation during the application or interview process, visit.
Drug and Alcohol Policy
Wells Fargo maintains a drug free workplace. Please see our to learn more.
Wells Fargo Recruitment and Hiring Requirements:
a. Third-Party recordings are prohibited unless authorized by Wells Fargo.
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.