This role is for one of our clients
We are looking for a highly skilled Senior DevOps Engineer to lead the design, automation, and management of modern cloud infrastructure with a strong focus on Kubernetes. In this role, you will be responsible for building scalable, secure, and highly available platforms that support mission-critical applications. You will drive infrastructure automation, CI/CD excellence, observability, and cloud operations while partnering closely with engineering teams to ensure seamless deployments and operational reliability.
Requirements
Key Responsibilities
- Architect, deploy, and manage production-grade Kubernetes clusters to support scalable and resilient applications.
- Design and maintain cloud infrastructure across AWS, Azure, or GCP using Infrastructure-as-Code (IaC)practices.
- Build, optimize, and manage CI/CD pipelines to enable reliable and automated software delivery.
- Implement GitOps workflows and deployment automation using modern DevOps tools.
- Manage Kubernetes ecosystem components including Helm, ingress controllers, autoscaling, networking, and stateful workloads.
- Establish and maintain monitoring, logging, and alerting frameworks to ensure system visibility and proactive issue resolution.
- Lead incident response, troubleshooting, root cause analysis, and performance optimization initiatives.
- Drive cloud security, secrets management, compliance, and infrastructure governance best practices.
- Optimize infrastructure costs while maintaining performance, scalability, and reliability.
- Develop operational runbooks, documentation, and automation frameworks to improve engineering productivity.
- Collaborate with development teams to improve deployment processes and overall platform efficiency.
What Makes You a Great Fit
- 4–6 years of experience in DevOps, Cloud Infrastructure, or Site Reliability Engineering (SRE) roles.
- Deep hands-on expertise in Kubernetes administration, troubleshooting, scaling, and production operations.
- Strong experience with cloud platforms such as AWS, Azure, or GCP.
- Proficiency with Infrastructure-as-Code tools like Terraform and containerization technologies such as Docker.
- Hands-on experience with CI/CD and GitOps tools including GitHub Actions, GitLab CI, Jenkins, ArgoCD, Flux, or similar platforms.
- Strong scripting and automation skills using Bash, Python, Go, or other programming languages.
- Experience implementing observability solutions using tools such as Prometheus, Grafana, and centralized logging platforms.
- Solid understanding of Linux systems, networking fundamentals, security principles, and cloud architecture.
- Strong problem-solving abilities with an automation-first and reliability-focused mindset.
- Excellent communication and collaboration skills, with the ability to work effectively across engineering and operations teams.
- A proactive, ownership-driven approach to building scalable infrastructure and driving continuous improvement.