Description
We are seeking a Site Reliability Engineer with 3-5 years of experience to join our dynamic team. The ideal candidate will be responsible for ensuring the reliability, performance, and availability of our systems and applications. This role is based in India and requires a deep understanding of DevOps methodologies and tools.
Responsibilities
- Design, implement and maintain highly available and scalable systems and applications
- Perform system monitoring, troubleshooting, and incident management
- Collaborate with developers to ensure code quality, testing, and deployment automation
- Create and maintain infrastructure as code using tools such as Terraform, Ansible, and Puppet
- Participate in on-call rotation to provide 24/7 support for production systems
Skills and Qualifications
- Bachelor's or Master's degree in Computer Science or a related field
- 3-5 years of experience in Site Reliability Engineering or a related field
- Strong understanding of Linux/Unix operating systems and networking protocols
- Experience with cloud infrastructure providers such as AWS, Azure, or Google Cloud Platform
- Proficiency in programming languages such as Python, Ruby, or Java
- Experience with container orchestration tools such as Kubernetes or Docker Swarm
- Experience with monitoring and logging tools such as Prometheus, ELK stack, or Grafana
- Excellent problem-solving and communication skills