Search by job, company or skills

PwC India

Site Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Opportunity

We are looking for SREs who want to define what reliability means for the next generation of industrial software. Defining SLIs/SLOs, building observability platforms, and establishing incident management processes.

Responsibilities

  • Define and implement SLI/SLO frameworks for complex engineering systems across manufacturing and industrial clients
  • Design and deploy observability platforms using Prometheus, Grafana, and Datadog
  • Establish incident management processes and lead blameless post-mortems
  • Implement chaos engineering practices to proactively identify system weaknesses
  • Drive toil elimination through automation and platform improvements
  • Build reliability engineering capabilities within the practice and client organisations

Essential Skills

  • SLI/SLO definition and implementation at enterprise scale
  • Observability: Prometheus, Grafana, Datadog, New Relic
  • Incident management and post-mortem facilitation
  • Chaos engineering: Gremlin, Chaos Monkey, Litmus
  • Python testing for reliability validation and automated runbooks
  • Automation and scripting: Python, Go, Bash
  • Cloud platforms: AWS, Azure, GCP

Experience

510 years in SRE or Production Engineering roles with experience in enterprise or industrial environments

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 145400341

Similar Jobs