Site Reliability Engineer

Idox plc

Pune, India

4-6 Years

Save

Posted 4 days ago
Be among the first 10 applicants

Early Applicant

Job Description

Site Reliability Engineer (AWS)

Engineering, Idox Software

Pune, India

About the role

We are seeking a motivated and detail-oriented Site Reliability Engineer (SRE) with a passion for ensuring application reliability and performance on AWS. This role offers an exciting opportunity for professionals with 4 to 6 years of experience in SRE, Production Support, or Operations to deepen their expertise in observability, incident management, and performance optimisation.

This will be the founding SRE role for our new Pune office, providing leadership growth opportunities as we expand our Pune capability. You will be part of a global SRE function, working with teams across the UK and India to provide comprehensive coverage for our AWS-hosted applications serving government and regulated sector clients.

You'll gain meaningful ownership of application reliability, working closely with experienced engineers across our global teams. This is a high-impact position with tremendous growth potential, ideal for someone ready to take the next step in their SRE career.

Key Responsibilities

Monitor, troubleshoot, and resolve production incidents affecting applications running on our AWS platform.
Implement and improve observability using tools such as CloudWatch, Prometheus, Grafana, and distributed tracing.
Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure application reliability.
Participate in on-call rotations as part of a global follow-the-sun support model.
Conduct root cause analysis and produce post-incident reviews to prevent recurrence.
Identify and resolve performance bottlenecks to ensure applications meet client expectations.
Collaborate with development teams to improve application reliability and operability.
Automate repetitive operational tasks to reduce toil and improve response times.

To be successful, you should bring:

4 to 6 years experience in SRE, Production Support, DevOps Operations, or similar reliability-focused roles.
Strong working knowledge of AWS services commonly used in production environments (EC2, ECS/EKS, RDS, ALB, CloudWatch, S3).
Experience with monitoring and observability tools (CloudWatch, Prometheus, Grafana, Datadog, or similar).
Familiarity with containerised applications (Docker, Kubernetes).
Scripting skills for automation and analysis (Python, Bash, or similar).
Understanding of application performance fundamentals (latency, throughput, error rates).
Excellent troubleshooting and analytical skills.
Strong communication skills for incident coordination and working with distributed teams.

Desirable qualities:

Experience working in regulated or government software environments.
Exposure to incident management frameworks (PagerDuty, Opsgenie, or similar).
Understanding of SRE principles including SLOs, SLIs, error budgets, and toil reduction.

Experience with log aggregation and analysis (ELK stack, CloudWatch Logs Insights, or similar).

AWS certifications (Solutions Architect Associate, SysOps Administrator, or DevOps Engineer).

What we offer:

Hands-on ownership of production reliability for applications serving critical public sector clients.
Mentorship from experienced engineers across our global SRE and platform teams.
Clear growth path as we expand our Pune SRE capability.
Opportunity to work on meaningful systems that make an impact for government and healthcare clients.
Collaborative, inclusive culture with room to grow and develop your skills.