Search by job, company or skills

inoptra digital

Site Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description: Site Reliability Engineer

For this position, we're looking for talented & experienced engineers who have a passion for infrastructure & automation.

As a Site Reliability Engineer (SRE), you will work within the development team to combine software and systems engineering and run large-scale distributed systems. You will also maintain the client's systems capacity and performance.

Responsibilities

  • Taking part in architecture-level discussions, design, planning, and implementation.
  • Researching to ensure what we are building is always the best path forward.
  • Documenting each project to facilitate integration for users.
  • Driving proof of concepts and minimal viable products for demonstration.
  • Designing and delivery of Infrastructure as Code.
  • Developing and implement automation for routine tasks, including alerting, system monitoring, and response mechanisms.
  • Developing and maintaining dashboards for monitoring and observability.
  • Supporting multiple services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Incident management and participating in on call rotation.

Education And Experience

  • To succeed in this role, candidates must have a strong foundational knowledge and demonstrated proficiency of Linux/Unix. (Talos)
  • At least 5 years of SRE or similar experience as a DevOps or Software Engineer.
  • At least two years of programming experience in a conventional programming language.
  • Kubernetes knowledge is required. Experience with bare metal / non-managed Kubernetes would be a plus.
  • Experience in Python and other scripting languages.
  • Experience with infrastructure-as-code and configuration management tools (e.g., Terraform, Ansible, Helm, Puppet, or Chef).
  • Networking and cloud computing platform experience.
  • Proficiency in scripting and programming languages (e.g., Bash, Python, Go, Node, Java, or similar).
  • Familiarity with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK Stack, or similar).
  • Experience with Grafana Mimir.
  • Familiarity with CI/CD tools and SDLC practices.
  • You have strong problem-solving skills and excellent communication skills.
  • You can work independently as well as collaboratively in a remote team environment.

You are friendly, collaborative, humble, honest, and always s

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 144931683

Similar Jobs