Search by job, company or skills

Caizin

Senior Cloud Engineer

4-8 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Responsibilities

  • Design, deploy, and manage high-availability, scalable cloud infrastructure using AWS services, with a focus on EKS, MongoDB, Cassandra, and Kafka.
  • Develop and implement automation scripts using Terraform, GitOps, ArgoCD, and other IaC tools to provision, configure, and maintain infrastructure.
  • Ensure zero-downtime deployments through efficient CI/CD pipelines and automated rollbacks.
  • Analyse, monitor, and improve infrastructure performance, scaling, and reliability to meet high availability and disaster recovery requirements.
  • Create reusable integrations with third-party tools like CI/CD systems, monitoring solutions, and container registries to optimise and consolidate workflows.
  • Troubleshoot and resolve infrastructure and deployment issues, ensuring rapid response and root cause analysis (RCA) for production incidents.
  • Participate in an on-call rotation to provide 24/7 support for critical systems as required.
  • Collaborate with cross-functional teams to implement best practices around observability, monitoring, and logging.
  • Document infrastructure processes, configurations, and operational runbooks to support a knowledge-sharing culture.

Requirements

  • 4-8 years of professional experience in DevOps or software engineering roles, with a focus on configuring, deploying, and maintaining Kubernetes in AWS.
  • Strong proficiency in infrastructure as code (IaC) using Terraform, AWS CloudFormation, or similar tools.
  • Experience with scripting and automation using languages such as Python.
  • Experience with CI/CD pipelines and automation tools such as Concourse, Jenkins, or Ansible.
  • Experience with teams having delivered observability and telemetry tools and practices, such as Prometheus, Grafana, ELK stack, distributed tracing, and performance monitoring.
  • Experience with cloud-native tools such as Istio, Argo CD, External Secrets Operator, Keda, Karpenter, etc.
  • Understanding SRE principles includes monitoring, alerting, error budgets, fault analysis, and automation.
  • Concepts of SLI, SLO, and SLA define SLIs (Service Level Indicators), SLOs (Service Level Objectives), and error budgets.
  • Excellent problem-solving skills and attention to detail.
  • Experience with service mesh technologies like Istio.
  • Familiarity with tools like External Secrets Operator, Keda, or Karpenter for scaling workloads.
  • Certifications such as AWS Certified Solutions Architect or Kubernetes Administrator.

This job was posted by Mansi Shah from Caizin.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 136126859