Description
We are seeking a skilled Site Reliability Engineer (SRE) to join our dynamic team in India. The ideal candidate will have a strong background in managing and improving production systems, ensuring reliability, scalability, and performance.
Responsibilities
- Design, implement, and maintain scalable and reliable systems.
- Monitor system performance and troubleshoot issues.
- Automate operational processes to reduce manual intervention.
- Collaborate with development teams to improve system architecture and deployment processes.
- Participate in on-call rotations to provide support for production systems.
- Perform capacity planning and ensure system scalability.
- Develop and maintain documentation for systems and processes.
Skills and Qualifications
- 5-7 years of experience in Site Reliability Engineering or related field.
- Strong knowledge of Linux/Unix systems and shell scripting.
- Experience with cloud platforms such as AWS, Azure, or Google Cloud.
- Proficiency in programming languages such as Python, Go, or Java.
- Familiarity with containerization technologies like Docker and orchestration tools like Kubernetes.
- Experience with monitoring tools such as Prometheus, Grafana, or Nagios.
- Strong understanding of networking concepts and protocols.
- Knowledge of CI/CD pipelines and tools like Jenkins, GitLab CI, or CircleCI.