CloudOps Engineer
Job Description
Role Summary
The
CloudOps Engineer is responsible for the
day-to-day operations, reliability, performance, and cost optimization of cloud environments across infrastructure, applications, and data platforms. This role ensures secure, scalable, and highly available cloud operations by implementing automation, monitoring, and governance aligned with organizational standards.
Key Responsibilities
Cloud Operations & Reliability
- Manage and monitor cloud infrastructure, applications, and platform services to ensure high availability and performance
- Implement incident management, root cause analysis (RCA), and problem resolution
- Ensure uptime, reliability, and performance using SRE practices (SLI/SLO/SLA)
- Handle on-call support and production issues
Platform & Environment Management
- Operate and maintain cloud environments (dev/test/stage/prod)
- Manage subscriptions/accounts, RBAC, IAM roles, and access controls
- Maintain network configurations, VMs, containers, storage, and platform services
- Support deployment pipelines and environment provisioning
Automation & DevOps Enablement
- Develop and maintain Infrastructure-as-Code (Terraform/Bicep/CloudFormation)
- Automate deployment, scaling, patching, and configuration management
- Support CI/CD pipelines and ensure smooth release management
- Implement auto-scaling and self-healing mechanisms
Monitoring, Logging & Observability
- Implement and manage monitoring tools, alerts, dashboards, and logging frameworks
- Ensure proactive detection of issues using metrics, logs, and traces
- Optimize system performance and reduce downtime
Security, Compliance & Governance
- Enforce security best practices (IAM, encryption, network security)
- Ensure compliance with organizational policies and regulatory requirements
- Implement backup, disaster recovery (DR), and business continuity (BCP)
- Monitor cost usage, tagging, and budget controls
Migration & Support
- Support cloud migration activities (rehost, replatform) from an operations perspective
- Validate deployment readiness, rollback strategies, and runbooks
- Ensure smooth transition to production and post-go-live support
Collaboration & Continuous Improvement
- Work closely with DevOps, Developers, Architects, and Security teams
- Improve operational efficiency through automation and optimization
- Document runbooks, SOPs, and operational procedures
Required Experience
- 5–10 years in Cloud Operations / DevOps / SRE roles
- Hands-on experience with Azure / AWS / GCP cloud platforms
- Strong knowledge of:
- Infrastructure-as-Code (Terraform/Bicep/CloudFormation)
- CI/CD tools (Azure DevOps, Jenkins, GitHub Actions)
- Monitoring tools (CloudWatch, Azure Monitor, Prometheus, Grafana)
- Experience with:
- Containers (Docker, Kubernetes)
- Networking, IAM, and security best practices
- Familiarity with incident management, DR/BCP, and cost optimization