Search by job, company or skills

  • Posted 2 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About The Opportunity

A technology services organization operating in the IT Services / HR Technology sector, delivering cloud-hosted platforms and managed infrastructure for enterprise customers. We build and run production-grade SaaS solutions focused on reliability, performance, and secure operations across public cloud environments. This role is for an on-site Site Reliability Engineer supporting critical production systems in India.

Role & Responsibilities

  • Maintain service reliability and uptime for production systems through proactive monitoring, incident response, and root-cause analysis.
  • Implement and operate infrastructure as code to provision, scale, and secure cloud resources across AWS environments.
  • Design, build, and maintain container orchestration platforms, CI/CD pipelines, and automated deployment workflows.
  • Develop and operate observability tooling (metrics, logs, traces) and dashboards to surface SLIs/SLOs and reduce MTTR.
  • Automate repetitive operational tasks with scripts or small services and own runbooks for on-call rotations.
  • Collaborate with development teams to improve application resiliency, capacity planning, and release practices.

Skills & Qualifications Must-Have

  • Kubernetes
  • Docker
  • Linux
  • AWS
  • Terraform
  • Prometheus
  • Grafana
  • Jenkins

Preferred

  • Python
  • Golang
  • HashiCorp Vault

Additional Qualifications

  • Proven experience operating production services with strong focus on reliability, automation, and observability.
  • Familiarity with on-call practices, incident management workflows, and post-incident remediation.
  • Ability to work on-site in India and collaborate across engineering, product, and support teams.

Benefits & Culture Highlights

  • Hands-on, outcome-driven engineering culture with ownership of end-to-end production systems.
  • Opportunity to influence architecture, tooling, and SRE practices for mission-critical platforms.
  • Structured on-call support, knowledge-sharing forums, and career growth into platform engineering roles.

Skills: kubernetes,docker,aws,jenkins,prometheus,grafana,site reliability engineering,linux,python,terraform

More Info

Job Type:
Industry:
Employment Type:

Job ID: 145601845

Similar Jobs