- Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
- Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
- Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
- Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
- Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
- Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
- Supports the adoption of site reliability engineering best practices within your team
Required qualifications, capabilities, and skills
- Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experience
- Ability to code in at least one programming language
- Exposure to cloud platforms (e.g., AWS, Google Cloud, Azure) and understanding of cloud services.
- Familiar with site reliability concepts, principles, and practices
- Familiar with observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace,
- Prometheus, Datadog, Splunk, and others
- Basic understanding of Linux/Unix systems, including command-line tools and shell scripting.
- Experience with scripting languages such as Python, Bash, or similar.
- Emerging knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
- Ability to demonstrate and apply existing and new system processes, methodologies, and skills to contribute to the development of systems
Preferred qualifications, capabilities, and skills
- General knowledge of financial services industry