Search by job, company or skills

HARMAN India

Kubernetes Expert

new job description bg glownew job description bg glownew job description bg svg
  • Posted 13 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Summary

We are seeking a highly skilled Site Reliability Engineer (SRE) to architect, operate, and scale Kubernetes-based infrastructure across on-premises and cloud environments. This role emphasizes manual application deployments, observability, security, and uptime accountability, while promoting resilience, automation, and operational excellence.

You will work closely with cross-functional teams to ensure performance, availability, and reliability of mission-critical services, while evolving platform capabilities using modern tools like Terraform, Helm, and Argo CD.

Key Responsibilities

  • Design, build, and manage highly available Kubernetes clusters across hybrid environments (on-premises and cloud platforms such as AWS EKS, Azure AKS).
  • Deploy and manage applications manually using tools such as kubectl and Helm, with growing integration of GitOps practices (e.g., ArgoCD).
  • Implement and manage observability stacks using Prometheus, Grafana, Loki, and Mimir to monitor infrastructure, applications, and system performance.
  • Define, monitor, and improve SLA/SLO/SLI metrics and alerting systems to ensure platform reliability.
  • Automate provisioning and configuration of infrastructure using Terraform, Helm, and scripting languages (e.g., Bash, Python).
  • Plan, implement, and test backup and disaster recovery (DR) strategies using tools like Velero, Commvault, etc.
  • Manage Kubernetes-native networking, storage, and security configurations (Ceph, NFS, Ingress, PodSecurityPolicies, etc.).
  • Configure and enforce Kubernetes security best practices using RBAC, OPA/Gatekeeper, NetworkPolicies, and secrets management tools.
  • Integrate and operate Kubernetes ecosystem tools such as Karpenter, MicroK8s, Service Meshes, and kubectl plugins.
  • Conduct root cause analysis (RCA) and lead resolution efforts for incidents.
  • Participate in the on-call rotation for platform availability and incident management.
  • Maintain up-to-date documentation, architecture diagrams, runbooks, and SOPs.
  • Mentor engineers and advocate for Kubernetes, security, observability, and deployment best practices across teams.
  • Continuously stay informed of industry trends in container orchestration, GitOps, security, and cloud-native tooling.

Required Qualifications

  • 79 years of IT/Infrastructure/DevOps experience, with 5+ years in Kubernetes operations in production environments.
  • Strong hands-on experience in Kubernetes architecture, cluster operations, and manual application deployment practices.
  • Intermediate-level experience in Kubernetes Security, including:
    • Cluster hardening, secrets management
    • Pod Security Standards (PSS), OPA/Gatekeeper
    • Network policies, image scanning, and runtime protections
  • Intermediate experience with ArgoCD for GitOps-style Kubernetes deployments.
  • Solid proficiency in Linux system administration (Ubuntu, CentOS, RHEL) and troubleshooting.
  • Hands-on experience with Kubernetes-native storage (e.g., Ceph, NFS) and persistent volume provisioning.
  • Strong familiarity with observability tools: Grafana, Prometheus, Loki, Mimir, etc.
  • Proficiency in Infrastructure as Code using Terraform, Helm, and scripting.
  • Experience with Velero, Commvault, or similar for backup and DR.
  • Experience operating and optimizing cloud-native Kubernetes platforms like EKS, AKS.
  • Exposure to tools like Karpenter, MicroK8s, Service Mesh, and Ingress Controllers.
  • Familiarity with AI/ML workloads running on Kubernetes is a plus.
  • Excellent collaboration, communication, documentation, and incident resolution skills.

Preferred Qualifications

  • Kubernetes certifications: CKA, CKAD, or CKS.
  • Strong understanding of container security, networking, and distributed system architecture.
  • Experience using Portainer for container and Kubernetes management.
  • Advanced knowledge of Grafana and other enterprise-grade observability tools.
  • Experience managing large-scale Kubernetes clusters (200+ nodes) is highly preferred.
  • Prior experience supporting production-grade, high-availability platforms and environments.

Why Join Us

  • Help shape and operate mission-critical, modern Kubernetes infrastructure.
  • Be part of a team focused on platform reliability, observability, and secure operations.
  • Contribute to and influence the evolution of deployment and automation practices (GitOps, IaC).
  • Access cutting-edge tools, industry best practices, and continuous learning.

Enjoy competitive compensation, flexible working options, and a growth-focused engineering culture

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 133098173