
Search by job, company or skills

|Primary skills: Kubernetes; AWSSecondary skills: GrafanaKey ResponsibilitiesOperations & AdministrationPerform daily health checks of Kubernetes clusters (AKS/EKS/GKE).Manage and troubleshoot pods; deployments; services; and namespaces.Apply upgrades; patches; and cluster maintenance activities as per SOPs.Handle incident tickets (P1/P2/P3); perform root cause analysis; and provide fixes or escalation.Monitoring & TroubleshootingMonitor cluster health and workloads using tools like Prometheus; Grafana; ELK/EFK; Azure Monitor; or CloudWatch.Resolve issues related to pod failures; node scaling; network policies; or storage volumes.Collaborate with application teams to resolve issues in containerized workloads.Security & ComplianceManage RBAC; secrets; and config maps as per enterprise governance policies.Perform image scanning; vulnerability patching; and apply compliance standards.Ensure clusters adhere to IT security and audit requirements.Automation & MaintenanceSupport CI/CD pipelines for deploying applications into Kubernetes.Use Helm/Kustomize for upgrades and configuration management.Automate repetitive operational tasks with scripts (Bash; Python; PowerShell).Collaboration & EscalationWork with Cloud Platform and Application teams on incident triage.Escalate complex design/architecture issues to the Cloud Engineering team.Provide on-call support and after-hours incident resolution when required.Required Skills & ExperienceHands-on experience with Kubernetes operations/support (24+ years).Strong knowledge of containers (Docker) and workload management.Experience with at least one cloud provider: Azure (AKS); AWS (EKS); or GCP (GKE).Familiarity with Helm; Kustomize; and CI/CD pipelines.Knowledge of monitoring tools (Prometheus; Grafana; ELK/EFK; Datadog).Good understanding of RBAC; networking basics (CNI; Ingress; DNS); and storage classes.Scripting knowledge (Bash; Python; PowerShell) for automating ops tasks.Strong troubleshooting skills for incidents in production environments.Nice to HaveExposure to GitOps tools (ArgoCD; Flux).Experience with logging/alerting integrations (PagerDuty; ServiceNow).Familiarity with FinOps practices in Kubernetes (cost monitoring; resource quotas).Basic knowledge of service mesh (Istio; Linkerd).Soft SkillsStrong problem-solving and analytical thinking.Ability to work under pressure in P1/P2 incidents.Good communication skills for working with application and cloud teams.Willingness to work in 24x7 support model (rotational shifts).
Capgemini
Job ID: 132245647
Skills:
Amazon Web Services, Terraform, Ansible, PowerShell, Microsoft Azure, Bash, Kubernetes, Python, Linux Server Administration, Palo Alto VM-Series firewalls
Skills:
virtual networks , Servicenow, Paas, Azure Functions, APIM, Cloud Infrastructure, Infoblox, System Center, ARM templates, Terraform, Qualys, PowerShell, Iaas, Load Balancers, Ansible, Zenoss, Azure, Azure Data Lake Store, Prisma Cloud, DevOps pipeline, Azure SQL Database, Azure Backup, Azure monitor, Azure security center, Express Route, AKS, Azure Site Recovery
Skills:
Power Bi, PowerShell, Azure Functions, Terraform, Azure, Python, Azure OpenAI Service, Azure Advisor, Azure Blob Storage, Azure SQL Database, Azure Cost Management, Bicep, Azure Virtual Machines
Skills:
PowerShell, Bash, Terraform, Python, Azure DevOps, Azure Migrate, Azurerm, Azure Container Instances, VNet integration, AKS, AzAPI, ACR, Private Link, Azure Policy as code, Azure Site Recovery, Azure networking
Skills:
S3, Shell scripting, Vpc, AWS CloudFormation, Lambda, RDS, AWS, Python, Iam, Terraform, Ec2, Kms, Cloudwatch, SSM, CI CD pipelines
We don’t charge any money for job offers