Search by job, company or skills

I

Senior Site Reliability Engineer

5-8 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 days ago
  • Be among the first 20 applicants
Early Applicant
Quick Apply

Job Description

Responsibilities:

  • Assist the Principal SRE in driving the architecture, development and operation of Stoxxs SRE efforts
  • Mentor junior members of the SRE team, offering expertise and advice as needed.
  • Shift and on-call pattern aligned with global region, to provide 24h/5 coverage
  • Align engineering effort with SRE principles
  • Engage with engineering and product teams to implement Service Level Objectives
  • Work with other teams to implement observability solution
  • Work in cross-functional teams to implement items in the SRE roadmap
  • Work on continuous improvement initiatives to enhance application performance and customer satisfaction
  • Keep abreast of emerging trends and technologies in SRE, and promote them across engineering and business functions
  • Implement incident management tooling, working closely with Service Management team
  • Own problem resolution and ensure fixes are implemented in a timely fashion

Requirements:

  • 5 years of total experience and at least 3 years of relevant experience in Site Reliability Engineering or production management
  • Good understanding of SRE principles
  • Experience implementing observability stacks such as ELK, Prometheus/Grafana, Splunk, Data Dog or other scalable solution
  • Expertise in creating SLO dashboards using multiple data sources
  • Strong experience of cloud-native ways of working
  • Experience with the development and deployment of large-scale, complex technology platforms
  • Deep understanding of cloud technology across database, serverless, containerization and API
  • Advanced level expertise in Terraform
  • Extensive experience in designing and implementing SRE practices
  • Experience with one or more CI/CD solutions
  • Experience coaching and mentoring high-performing teams
  • Excellent knowledge of integrating incident management tooling such as Rootly, blameless, ServiceNow or incident-io.
  • Pragmatic experience using agile to deliver incremental value
  • Experience working in a global or multinational team setting
  • Strong knowledge management, documentation, communication and collaboration skills
  • Proven ability to drive innovation and continuous improvement initiatives
  • Focus on simplicity, automation and data
  • Expertise in Python, GitHub Actions, Apigee, Airflow
  • Bachelors or Masters degree in Computer Science, Mathematics, Physics or related field

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

Job ID: 120340965

Similar Jobs