Site Reliability Engineer

InOpTra Digital

Bengaluru, India

5-7 Years

This job is no longer accepting applications

Posted a month ago

Job Description

Job Description: Site Reliability Engineer

For this position, we're looking for talented & experienced engineers who have a passion for infrastructure & automation.

As a Site Reliability Engineer (SRE), you will work within the development team to combine software and systems engineering and run large-scale distributed systems. You will also maintain the client's systems capacity and performance.

Responsibilities

Taking part in architecture-level discussions, design, planning, and implementation.
Researching to ensure what we are building is always the best path forward.
Documenting each project to facilitate integration for users.
Driving proof of concepts and minimal viable products for demonstration.
Designing and delivery of Infrastructure as Code.
Developing and implement automation for routine tasks, including alerting, system monitoring, and response mechanisms.
Developing and maintaining dashboards for monitoring and observability.
Supporting multiple services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
Incident management and participating in on call rotation.

Education And Experience

To succeed in this role, candidates must have a strong foundational knowledge and demonstrated proficiency of Linux/Unix. (Talos)
At least 5 years of SRE or similar experience as a DevOps or Software Engineer.
At least two years of programming experience in a conventional programming language.
Kubernetes knowledge is required. Experience with bare metal / non-managed Kubernetes would be a plus.
Experience in Python and other scripting languages.
Experience with infrastructure-as-code and configuration management tools (e.g., Terraform, Ansible, Helm, Puppet, or Chef).
Networking and cloud computing platform experience.
Proficiency in scripting and programming languages (e.g., Bash, Python, Go, Node, Java, or similar).
Familiarity with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK Stack, or similar).
Experience with Grafana Mimir.
Familiarity with CI/CD tools and SDLC practices.
You have strong problem-solving skills and excellent communication skills.
You can work independently as well as collaboratively in a remote team environment.

You are friendly, collaborative, humble, honest, and always s