Search by job, company or skills

Eutech

Site Reliability Engineer Lead

Early Applicant
  • 5 months ago
  • Be among the first 50 applicants

Job Description

Position Overview

We are seeking a knowledgeable and highly skilled Site Reliability Engineer who has the aspiration and demonstrable qualities to take up the role of a Functional Head to join our dynamic and growing organization. At first, the job will require to ensure the reliability, performance, and scalability of our infrastructure and services. In parallel, over the course of about 12 to 18 months, the job will require the candidate to perform a critical role in hiring and leading the Site Reliability Engineering (SRE) team. Technical expertise and leadership skills will be instrumental in driving operational excellence and fostering a culture of continuous improvement within the team.

Responsibilities

  • Align the Site Reliability Engineering (SRE) team's strategic direction with company objectives by developing plans to enhance system reliability, uptime, and service quality, while collaborating with technical leaders to set and achieve SRE goals.
  • Lead and mentor the SRE team, conduct performance evaluations, provide feedback, and develop training programs to address skills gaps.
  • Oversee monitoring, alerting, and incident response processes, collaborate with development teams for reliable software delivery, and enforce SRE best practices and standards.
  • Own critical incident resolution with timely stakeholder communication, conduct post-incident reviews and root cause analysis, and implement effective escalation procedures.
  • Identify automation opportunities to streamline SRE workflows and enhance reliability, and implement tools and technologies to optimize operations.
  • Collaborate with infrastructure and capacity planning teams to ensure resource adequacy, monitor performance, analyze trends, and recommend capacity improvements.
  • Foster a culture of continuous improvement and innovation within the SRE team, encouraging and supporting professional development opportunities.

Requirements:

  • Bachelor's degree in Computer Science, Engineering, or related fields.
  • 3 to 5 years of experience as a Site Reliability Engineer or a related role.
  • Knowledge of monitoring and observability tools (e.g., DataDog, Prometheus).
  • Strong background in system administration, Infrastructure management.
  • Solid understanding of networking principles, TCP/IP, load balancing, and DNS.
  • Experience with incident response and on-call rotations, and familiarity with incident management tools (e.g., Jira).
  • Familiarity with programming and scripting languages, such as Python, C++, etc.
  • Solid understanding of Linux/Unix systems and networking concepts.
  • Proficiency in Shell scripting
  • Experience with cloud platforms, such as AWS, Azure, Oracle Cloud etc.
  • Knowledge of containerization technologies like Docker and container orchestration platforms like Kubernetes.
  • Familiarity with Infrastructure as Code (IaC) tools like Terraform or Ansible and version control tools (e.g. Git)
  • Strong analytical and problem-solving skills with a passion for troubleshooting complex issues.
  • Excellent communication and collaboration abilities to work effectively with cross-functional teams.
  • DevOps mindset and a drive for automation to improve efficiency and repeatability.

More Info

Industry:Other

Function:technology

Job Type:Permanent Job

Date Posted: 20/06/2024

Job ID: 82379253

Report Job

About Company

Follow

Hi , want to stand out? Get your resume crafted by experts.

Similar Jobs

Lead Site Reliability Engineer

ZenotiCompany Name Confidential

Lead Site Reliability Engineer

Shell Recharge SolutionsCompany Name Confidential
Last Updated: 21-06-2024 10:42:37 AM
Home Jobs in Bengaluru / Bangalore Site Reliability Engineer Lead