Design, implement, and manage highly secure, scalable, and resilient infrastructure, with Kubernetes as the central orchestration layer
Own and automate infrastructure provisioning using IaC tools (e.g., Terraform, Pulumi), focusing on production-grade Kubernetes clusters
Develop robust automation frameworks and tools in Python/Go/Shell to streamline platform operations, deployment workflows, and day-to-day infra tasks
Lead the observability strategyimplement and maintain metrics, logging, and tracing solutions to ensure platform performance, reliability, and cost-efficiency
Monitor and manage Kubernetes workloads, health probes, resource limits/quotas, pod autoscaling, and network policies at scale
Drive CI/CD pipeline improvements, including GitOps workflows, container image management, and zero-downtime deployments using Helm or Kustomize
Investigate and resolve infrastructure and application-level issues through in-depth troubleshooting and debugging
Participate in incident response, RCA, and blameless postmortems; contribute to system reliability and SRE best practices
Evaluate and implement cutting-edge DevOps/Kubernetes tooling, while evangelizing infrastructure and coding standards across teams
Required Skills
:3+ years of hands-on experience in DevOps/SRE/Platform Engineering roles with production infrastructur
eDeep experience with Kubernetes deploying, scaling, and maintaining services across clusters (EKS, GKE, AKS, or self-managed
)Proficient with at least one major cloud platform (AWS, GCP, or Azure
)Strong proficiency in Infrastructure as Code (IaC) using Terraform, Pulumi, or CloudFormatio
nSolid programming/scripting ability in Python, Go, Bash, or Java comfortable writing automation scripts and building tool
sIn-depth knowledge of Linux internals, networking, and container runtimes (Docker, containerd
)Experience with observability and monitoring tools like Prometheus/Grafana, ELK, DataDog, or New Reli
cComfortable working with microservices, containerized apps, and REST-based architecture
sPassionate about building reliable infrastructure through code and working in a fast-paced, engineering-driven cultur