Search by job, company or skills

IndiGo (InterGlobe Aviation Ltd)

SRE Engineer

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 months ago

Job Description

Job Summary

The Site Reliability Engineer is responsible for ensuring the reliability, availability, performance, and scalability of infrastructure and applications. The role emphasizes automation, monitoring, incident management, and continuous improvement, working closely with development and operations teams.

  • Key Responsibilities
  • Ensure high availability, reliability, and performance of production systems
  • Design, implement, and maintain monitoring, alerting, and observability solutions
  • Automate infrastructure provisioning, deployments, and operational tasks
  • Lead incident response, troubleshooting, and root cause analysis (RCA)
  • Optimize system performance, scalability, and capacity planning
  • Collaborate with development teams to improve application reliability and operability
  • Define, track, and improve SLAs
  • Reduce operational toil through automation and process improvement
  • Ensure security, compliance, and best operational practices
  • Participate in on-call rotations and providing production support
  • Required Skills / Must-Have

Technical Skills

  • Linux/Unix system administration
  • Kubernetes / OpenShift administration and troubleshooting
  • Cloud platforms: AWS / Azure
  • Monitoring & observability: Prometheus, Grafana, ELK, Datadog
  • Scripting: Shell, Python, or Go
  • Infrastructure as Code: Terraform, Ansible, Helm
  • CI/CD pipelines and DevOps practices

Experience

  • Experience in SRE / DevOps / Platform Engineering / Production Support
  • Experience managing production-grade distributed systems
  • Nice-to-Have / Preferred Skills
  • Service mesh experience (Istio, Linkerd)
  • Messaging systems: Kafka, ActiveMQ, RabbitMQ
  • Performance testing and load testing tools
  • Security and compliance experience in regulated environments
  • Exposure to Google SRE principles and practices
  • Education & Qualifications
  • Primary / Preferred Education
  • Bachelor's degree in Computer Science, Information Technology, or related field (preferred)
  • Certifications / Licenses

Preferred (Not Mandatory)

  • Red Hat OpenShift certification
  • Skills Grouping & Synonyms (for AI Matching)

Operations & Reliability

  • Site Reliability Engineering / Production Support / Platform Engineering
  • Incident management / Major incident / RCA / Postmortem

Cloud & Containers

  • Kubernetes / OpenShift / Container orchestration
  • Cloud infrastructure / IaaS / PaaS

Automation & DevOps

  • Infrastructure as Code / IaC / Terraform / Ansible
  • CI/CD / Continuous delivery / Automation

Monitoring & Observability

  • Monitoring / Alerting / Metrics / Logging / Tracing
  • Prometheus / Grafana / ELK / APM
  • Location & Work Mode
  • Location: Gurugram, Haryana
  • Work Mode: Onsite

More Info

Job Type:
Industry:
Employment Type:

Job ID: 140244575