We are seeking a skilled Senior Systems Engineer - Cloud Operations to join our team and play a pivotal role in managing and optimizing cloud infrastructure while ensuring seamless operations and governance compliance.
If you have extensive experience with GCP or similar environments and are passionate about problem-solving, collaboration, and documentation, we'd love to hear from you.
Responsibilities
- Provision and maintain GCP cloud resources to ensure scalability and security
- Collaborate on GKE cluster OS upgrades, container OS version libraries, and patching activities
- Execute cloud onboarding and infrastructure updates using Infrastructure as Code (IaC) tools like Terraform
- Proactively monitor logs and metrics to address performance or availability issues
- Diagnose recurring issues and contribute to automation and runbook development
- Enforce backup policies, validate restoration processes, and tackle backup-related challenges
- Support access management processes and troubleshoot permissions issues for governance compliance
- Assist in automating tagging enforcement and resource monitoring
- Resolve infrastructure-related issues by working closely with customer teams
- Maintain and create operational manuals, runbooks, and supporting scripts
Requirements
- 5-8 years of experience in cloud environments such as Google Cloud Platform (GCP)
- Proficiency in Infrastructure as Code (IaC) tools, such as Terraform, with a strong focus on implementation
- Knowledge of cloud monitoring tools, incident troubleshooting, and performance optimization
- Understanding of governance processes, tagging frameworks, and access management
- Familiarity with containerized ecosystems like GKE and Docker, as well as patch management processes
- Strong skills in problem-solving and collaborative efforts across teams
- Expertise in creating clear documentation, operational manuals, and runbooks
- Flexible availability for shift work