Key Responsibilities
Kubernetes & EKS Platform Engineering
- Architect, deploy, and operate production-grade Kubernetes clusters on AWS EKS
- Implement and manage EKS automation using EKS Blueprints and lifecycle management of add-ons
- Plan and execute Kubernetes and EKS version upgrades with minimal service disruption
Autoscaling & Compute Optimization
- Design and implement Karpenter-based autoscaling solutions for dynamic workload scaling
- Optimize compute resources for cost efficiency, performance, and high availability
Service Mesh & Traffic Management
- Design and operate Istio service mesh (including sidecar and ambient mesh models)
- Implement advanced traffic management policies such as mTLS, retries, circuit breaking, and timeouts
Security, Policy & Runtime Protection
- Implement Kubernetes governance using Kyverno and OPA/Gatekeeper
- Operate Falco for runtime threat detection and security incident investigation
- Integrate security and compliance controls into GitOps workflows
Infrastructure as Code & Automation
- Build and maintain reusable Terraform modules for AWS infrastructure (VPC, EKS, Transit Gateway, etc.)
- Implement Terragrunt-based multi-account and multi-region infrastructure setups
- Drive automation to reduce manual operations and improve scalability
GitOps & Platform Operations
- Design and manage Argo CD for GitOps-based deployment and platform operations
- Define Git-based promotion workflows and access control models across environments
Observability & SRE Practices
- Design and maintain monitoring and alerting systems using Prometheus
- Participate in incident response, root cause analysis, and reliability engineering improvements
- Reduce operational toil through automation and self-service capabilities
Security & Compliance
- Own remediation of security findings from tools such as Wiz across AWS and Kubernetes environments
- Collaborate with security teams to implement preventive security guardrails and best practices