Site Reliability Engineering (SRE)

Fresher

Save

Early Applicant

Job Description

Role & Responsibilities

Design, implement, and maintain reliable, scalable, and efficient cloud infrastructure and services.
Monitor system performance, troubleshoot issues, and optimize performance to ensure maximum uptime.
Automate infrastructure provisioning, deployment, and management tasks using scripting and IaC tools.
Collaborate with development teams to ensure reliability and scalability of applications and services.
Manage incident response, root cause analysis, and post-incident reviews to improve system robustness.
Document processes, conduct capacity planning, and implement security best practices for systems.

Skills & Qualifications

Must-Have
Proficiency in cloud platforms such as AWS, GCP, or Azure
Experience with Linux system administration
Knowledge of containerization tools like Docker and Kubernetes
Experience with automation tools such as Ansible, Terraform, or similar
Strong scripting skills in Bash, Python, or similar languages
Understanding of networking, security, and monitoring tools
Experience with incident management and troubleshooting
Preferred
Certifications like AWS Solutions Architect, GCP Professional Cloud Engineer, or similar
Experience in a 24/7 on-call environment

Benefits & Culture Highlights