Description
We are looking for a Site Reliability Engineer to join our team in India. The ideal candidate will work closely with development and operations teams to ensure the reliability and performance of our systems. This role requires a proactive approach to system monitoring, incident management, and automation to enhance our infrastructure.
Responsibilities
- Monitor and maintain system performance and reliability.
- Implement and manage CI/CD pipelines.
- Automate operational tasks and improve system efficiency.
- Collaborate with development teams to design scalable and reliable applications.
- Troubleshoot and resolve production incidents in a timely manner.
- Participate in on-call rotations and incident response activities.
- Develop and maintain documentation for system architecture and processes.
Skills and Qualifications
- 1-10 years of experience in Site Reliability Engineering or related fields.
- Strong knowledge of Linux/Unix systems.
- Experience with cloud services (AWS, GCP, Azure).
- Proficiency in at least one programming language (Python, Go, Java, etc.).
- Familiarity with containerization technologies (Docker, Kubernetes).
- Understanding of networking concepts and protocols.
- Experience with monitoring tools (Prometheus, Grafana, ELK Stack).
- Knowledge of automation tools (Ansible, Terraform, etc.).
- Excellent problem-solving skills and attention to detail.