Senior Site Reliability Engineer

10-12 Years

This job is no longer accepting applications

Job Description

Core Responsibilities

Automation & Coding: Developing scripts and tools (Python, Go, Java, Bash) to automate operational tasks and eliminate manual, repetitive work.
System Monitoring & Alerting: Using tools like Prometheus, Grafana, Datadog, or ELK Stack to monitor system health, latency, and error rates.
Incident Management: Responding to production incidents, performing root cause analysis (RCA), and conducting blameless post-mortems.
Capacity Planning & Scaling: Managing infrastructure capacity and performance to ensure scalability, often using cloud platforms like AWS, GCP, or Azure.
Collaboration: Working with development teams to improve service performance, reliability, and deployment procedures.

Required Skills and Qualifications

BE/ B Tech with 10+ Years experience as a SRE
Ready for contract role in rotational shift (4 AM, 2 PM) at Pune
Programming: Proficiency in at least one scripting or programming language (Python, Go, Ruby).
Infrastructure & Tools: Experience with Kubernetes, Docker, and Infrastructure as Code (IaC) tools like Terraform or Ansible.
System Administration: Strong knowledge of Linux/Unix operating systems and networking protocols (TCP/IP, DNS).
Experience: Usually requires a degree in Computer Science or equivalent experience, often with a background in software development or system administration.

Typical SRE Job Profile Summary