Search by job, company or skills

  • Posted 22 days ago
  • Over 50 applicants
Quick Apply

Job Description

|Primary skills: Kubernetes; AWSSecondary skills: GrafanaKey ResponsibilitiesOperations & AdministrationPerform daily health checks of Kubernetes clusters (AKS/EKS/GKE).Manage and troubleshoot pods; deployments; services; and namespaces.Apply upgrades; patches; and cluster maintenance activities as per SOPs.Handle incident tickets (P1/P2/P3); perform root cause analysis; and provide fixes or escalation.Monitoring & TroubleshootingMonitor cluster health and workloads using tools like Prometheus; Grafana; ELK/EFK; Azure Monitor; or CloudWatch.Resolve issues related to pod failures; node scaling; network policies; or storage volumes.Collaborate with application teams to resolve issues in containerized workloads.Security & ComplianceManage RBAC; secrets; and config maps as per enterprise governance policies.Perform image scanning; vulnerability patching; and apply compliance standards.Ensure clusters adhere to IT security and audit requirements.Automation & MaintenanceSupport CI/CD pipelines for deploying applications into Kubernetes.Use Helm/Kustomize for upgrades and configuration management.Automate repetitive operational tasks with scripts (Bash; Python; PowerShell).Collaboration & EscalationWork with Cloud Platform and Application teams on incident triage.Escalate complex design/architecture issues to the Cloud Engineering team.Provide on-call support and after-hours incident resolution when required.Required Skills & ExperienceHands-on experience with Kubernetes operations/support (24+ years).Strong knowledge of containers (Docker) and workload management.Experience with at least one cloud provider: Azure (AKS); AWS (EKS); or GCP (GKE).Familiarity with Helm; Kustomize; and CI/CD pipelines.Knowledge of monitoring tools (Prometheus; Grafana; ELK/EFK; Datadog).Good understanding of RBAC; networking basics (CNI; Ingress; DNS); and storage classes.Scripting knowledge (Bash; Python; PowerShell) for automating ops tasks.Strong troubleshooting skills for incidents in production environments.Nice to HaveExposure to GitOps tools (ArgoCD; Flux).Experience with logging/alerting integrations (PagerDuty; ServiceNow).Familiarity with FinOps practices in Kubernetes (cost monitoring; resource quotas).Basic knowledge of service mesh (Istio; Linkerd).Soft SkillsStrong problem-solving and analytical thinking.Ability to work under pressure in P1/P2 incidents.Good communication skills for working with application and cloud teams.Willingness to work in 24x7 support model (rotational shifts).

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

Job ID: 132245647

Similar Jobs