Job Description
Linux System Administration: Providing strong Linux system administration skills,, configuration, patching, troubleshooting, performance monitoring, security hardening, shell scripting for automation, and user/permissions management. Ensuring the stability and security of the underlying server platform hosting our open-source infrastructure.
Kubernetes Cluster Lifecycle Management Administering and managing the complete lifecycle of Kubernetes clusters, including provisioning, configuration, hardening, monitoring, logging, health checks, remediation, backup, and recovery. Developing and executing comprehensive Kubernetes upgrade strategies, encompassing planning, environment preparation, component upgrades (control plane and worker nodes), thorough testing and validation, and robust rollback procedures. Managing and troubleshooting associated Kubernetes components such as Container Runtime Interface (CRI), Container Network Interface (CNI), Container Storage Interface (CSI), Ingress Controllers, and potentially Service Meshes. Implementing and enforcing security best practices within the Kubernetes environment, including RBAC, network policies, and secret management. Optimizing Kubernetes cluster performance through resource management, network tuning, and identifying/resolving bottlenecks
Knowledge on Azure AKS, Google GKE (Added Advantage): Experience with managed Kubernetes services in the cloud is a significant plus: AKS (Azure Kubernetes Service): Understanding the architecture, features, and management of AKS clusters. This includes deploying and managing AKS clusters, scaling nodes, configuring networking and storage, and integrating with other Azure services. GKE (Google Kubernetes Engine): Familiarity with the architecture, features, and management of GKE clusters. This includes deploying and managing GKE clusters, using node pools, configuring networking and storage, and integrating with other Google Cloud Platform services