Site Reliability Engineering

viraaj hr solutions private limited

Coimbatore, India

Fresher

Save

Posted 2 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About The Opportunity

A technology services organization operating in the IT Services / HR Technology sector, delivering cloud-hosted platforms and managed infrastructure for enterprise customers. We build and run production-grade SaaS solutions focused on reliability, performance, and secure operations across public cloud environments. This role is for an on-site Site Reliability Engineer supporting critical production systems in India.

Role & Responsibilities

Maintain service reliability and uptime for production systems through proactive monitoring, incident response, and root-cause analysis.
Implement and operate infrastructure as code to provision, scale, and secure cloud resources across AWS environments.
Design, build, and maintain container orchestration platforms, CI/CD pipelines, and automated deployment workflows.
Develop and operate observability tooling (metrics, logs, traces) and dashboards to surface SLIs/SLOs and reduce MTTR.
Automate repetitive operational tasks with scripts or small services and own runbooks for on-call rotations.
Collaborate with development teams to improve application resiliency, capacity planning, and release practices.

Skills & Qualifications Must-Have

Kubernetes
Docker
Linux
AWS
Terraform
Prometheus
Grafana
Jenkins

Preferred

Python
Golang
HashiCorp Vault

Additional Qualifications

Proven experience operating production services with strong focus on reliability, automation, and observability.
Familiarity with on-call practices, incident management workflows, and post-incident remediation.
Ability to work on-site in India and collaborate across engineering, product, and support teams.

Benefits & Culture Highlights

Hands-on, outcome-driven engineering culture with ownership of end-to-end production systems.
Opportunity to influence architecture, tooling, and SRE practices for mission-critical platforms.
Structured on-call support, knowledge-sharing forums, and career growth into platform engineering roles.

Skills: kubernetes,docker,aws,jenkins,prometheus,grafana,site reliability engineering,linux,python,terraform