Senior Site Reliability Engineer

Grab (Grab a Grub Services Ltd)

Mumbai, India

4-6 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

About the Role

We are seeking a highly skilled and proactive Senior Site Reliability Engineer (SRE) to

join our dynamic team. In this role, you will be the cornerstone of our application

reliability and performance, working across the entire technology stack. You will bridge

the gap between development and operations, taking ownership of our systems health,

from deep-dive debugging and incident response to proactive optimization and

preventive engineering.

Your primary mission will be to build, maintain, and improve highly scalable and

reliable systems, ensuring an exceptional experience for our users.

Key Responsibilities

System Reliability s Performance:

Design, implement, and maintain highly available, scalable, and fault-
tolerant systems.
Ensure performance, quality, and responsiveness of applications.
Automate operational processes to improve efficiency and reduce
manual toil.

Incident Management s Response:

Lead the response to, and resolution of, critical incidents and outages.
Participate in an on-call rotation, serving as an escalation point for
complex system issues.
Work under pressure to diagnose and mitigate service disruptions.

Root Cause Analysis s Preventive Measures:

Conduct thorough post-incident reviews and Root Cause Analysis (RCA).
Drive the implementation of corrective and preventive actions to avoid
problem recurrence.
Champion a culture of blameless postmortems and continuous
improvement.

Application Maintenance s Support:

Provide ongoing support, maintenance, and optimization for applications
throughout their lifecycle.
Debug complex issues across the entire technology stack, from front-end
to back-end and database layers.
Collaborate with development teams to improve code deployment,
monitoring, and operational readiness.

Monitoring s Observability:

Utilize New Relic and other tools to build comprehensive monitoring,
alerting, and dashboards.
Analyze performance data to identify trends, predict capacity needs, and
pinpoint bottlenecks before they impact users.

ualifications & Technical Skills (What We're Looking For)

Must-Have:

4-6 years of experience in a Site Reliability Engineering, DevOps, or a
similar software engineering role with a focus on operations.
Strong hands-on experience in debugging and supporting applications
built on:

PHP and Node.js

MySQL and MongoDB

Proven expertise in using New Relic (or similar APM tools like ELK,
Splunk) for deep-dive performance analysis and application monitoring.
Demonstrable experience in leading incident management, from
detection to resolution, and conducting formal RCAs.
Solid understanding of Linux/Unix operating systems and networking
fundamentals.

Good-to-Have:

Proficiency with containerization and orchestration technologies (e.g.,
Docker, Kubernetes).
Experience with CI/CD pipelines (e.g., Jenkins, GitLab CI, GitHub
Actions).
Knowledge of cloud platforms (e.g., AWS, GCP, Azure).

More Info

Job Type:

Permanent Job

Industry:

Other

Function:

Site Reliability Engineering

Employment Type:

Full time

About Company

Grab (Grab a Grub Services Ltd)Job Source: www.linkedin.com

Job ID: 143765223

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 02-03-2026 09:08:04 PM

Homejobs in MumbaiSenior Site Reliability Engineer

Similar Jobs