Role & Responsibilities
- Design, implement, and maintain reliable, scalable, and efficient cloud infrastructure and services.
- Monitor system performance, troubleshoot issues, and optimize performance to ensure maximum uptime.
- Automate infrastructure provisioning, deployment, and management tasks using scripting and IaC tools.
- Collaborate with development teams to ensure reliability and scalability of applications and services.
- Manage incident response, root cause analysis, and post-incident reviews to improve system robustness.
- Document processes, conduct capacity planning, and implement security best practices for systems.
Skills & Qualifications
- Must-Have
- Proficiency in cloud platforms such as AWS, GCP, or Azure
- Experience with Linux system administration
- Knowledge of containerization tools like Docker and Kubernetes
- Experience with automation tools such as Ansible, Terraform, or similar
- Strong scripting skills in Bash, Python, or similar languages
- Understanding of networking, security, and monitoring tools
- Experience with incident management and troubleshooting
- Preferred
- Certifications like AWS Solutions Architect, GCP Professional Cloud Engineer, or similar
- Experience in a 24/7 on-call environment
Benefits & Culture Highlights
- Dynamic and collaborative work environment
- Opportunities for professional growth and certification
- On-site work with modern office facilities in India
Skills: incident management,aws,cloud,scripting