Service Reliability Engineer Or SRE

5-15 years
3 months ago 8 Applied
Job Description


Service Reliability Engineer (SRE) - Tech Focused Role

We're on the lookout for a passionate Service Reliability Engineer (SRE) who thrives on maintaining and improving the reliability of our services. Our tech stack includes Prometheus, Alert Manager & Grafana, Rancher, Open Search, Jaeger, Docker, Kubernetes (Azure Kubernetes Service), and various integration tools such as JIRA, SMTP, Microsoft Teams, Webhooks, REST APIs, and ServiceNow for monitoring. If you're eager to work on cutting-edge technologies and make a significant impact, we want you!

Key Responsibilities:

Design, deploy, and maintain our monitoring, logging, and alerting infrastructure using Prometheus, Grafana, Alert Manager, and Open Search.
Manage container orchestration using Kubernetes (AKS) and ensure seamless deployment and scaling of our services with Docker and Rancher.
Implement and maintain distributed tracing and observability with Jaeger to diagnose and troubleshoot microservices.
Develop and maintain integration between our monitoring tools and platforms like JIRA, SMTP, Microsoft Teams,Webhooks, REST APIs, and ServiceNow for efficient incident management and communication.
Work closely with development teams to ensure the reliability and scalability of our applications and services.
Automate processes and implement IaC (Infrastructure as Code) to minimize manual work and improve efficiency.
Participate in on-call rotations to ensure high availability of our services and quick resolution of incidents.
Contribute to the continuous improvement of our SRE practices by keeping up with the latest trends and technologies.

Strong experience with Kubernetes, Docker, and container orchestration tools.
Proficient in monitoring and observability tools such as Prometheus, Grafana, and Jaeger.
Experience with logging and alerting systems, preferably with Alert Manager and Open Search.
Familiarity with Rancher for managing Kubernetes clusters is a plus.
Solid understanding of CI/CD pipelines and automation tools.
Proficiency in scripting languages and automation tools.

Excellent problem-solving skills and the ability to work under pressure.
Strong communication skills and experience with tools like Microsoft Teams, SMTP, and REST APIs for integration purposes.
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience.





Service Reliability Engineer

IT Software

Career Advice to Find Better