Location: Pune
Experience: Minimum 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Operations.
Employment Type: Full-time
Job Overview
We are seeking a skilled Site Reliability Engineer (SRE) with experience in Private Cloud and infrastructure operations. The role focuses on ensuring the reliability, scalability, performance, and security of enterprise infrastructure while driving automation, observability, and DevOps best practices.
Key Responsibilities
- Design and maintain highly available and fault-tolerant infrastructure systems.
- Monitor system performance and ensure infrastructure reliability and scalability.
- Lead incident response, root cause analysis (RCA), and system performance improvements.
- Develop automation tools and scripts to streamline operations and reduce manual tasks.
- Implement Infrastructure as Code (IaC) using tools like Terraform or Ansible.
- Support containerized environments using Docker, Kubernetes, or OpenShift.
- Build and maintain monitoring, logging, and alerting systems.
- Collaborate with DevOps, development, and security teams to support CI/CD pipelines and ensure secure infrastructure operations.
Required Skills
- Strong experience in Linux/Unix system administration.
- Proficiency in Python, Go, Bash, or Shell scripting.
- Experience with cloud platforms (AWS, Azure, or GCP).
- Hands-on experience with containerization and orchestration technologies.
- Good understanding of networking concepts (DNS, TCP/IP, Load Balancing, Firewalls).
- Experience with monitoring and observability tools such as Prometheus, Grafana, ELK, or Datadog.