Search by job, company or skills

E

Site Reliability Engineer -3

7-12 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago
  • Be among the first 50 applicants
Early Applicant
Quick Apply

Job Description

Key Responsibilities:

Production System Management

  • Manage and support production-grade infrastructure across cloud and data centers.
  • Take ownership of monitoring and troubleshooting production systems, including on-call or shift-based support.
  • Deep dive into Linux system internals, networking, and debugging production issues.

Monitoring & Observability

  • Build and improve observability stacks using Prometheus, Grafana, ELK/EFK, or equivalent tools.
  • Partner with developers to ensure new features/services are production-ready with monitoring, logging, and failover strategies.

Automation & CI/CD

  • Develop and maintain automation scripts/tools using Python, Bash, or similar languages.
  • Work with CI/CD tools (Jenkins, GitHub Actions, GitLab CI) to support reliable deployments.
  • Continuously improve system availability, reliability, and performance through automation and process improvements.

Incident Management & Reliability

  • Drive incident management, root cause analysis (RCA), and implement long-term fixes.
  • Automate operational tasks to reduce mean time to recovery (MTTR).
  • Engineer systems to prevent recurring problems and ensure reliability at scale.

More Info

About Company

Exotel is your Al transformation partner for customer engagement and experience. Trusted by over 7000 clients globally across various industries, we facilitate over 25 billion annual conversations through omnichannel, voice, agents, and bots. Exotel's Al-powered solutions empower agents, bots and customers alike, enhancing interactions with conversational intelligence, and optimising resources to deliver exceptional CX and business growth. Exotel wins when you Win.

Job ID: 130609413

Similar Jobs