Search by job, company or skills

CloudKeeper

CloudKeeper - Platform Specialist - Kubernetes

10-12 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 22 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

The Opportunity

We are seeking a Platform Specialist (Director level) to serve as the organization's top technical authority on Kubernetes and the most senior hands-on engineer for CK-Kube, our Kubernetes Cost Intelligence platform. This is a deep individual contributor role 60%+ hands-on engineering where you will architect, implement, and technically lead CK-Kube as the principal engineer. You will set the technical direction, write production code, and drive architectural decisions. We are not looking for a people manager; we are looking for the strongest Kubernetes systems engineer we can find.

What You'll Own

CK-Tuner-Kubernetes Kubernetes Cost Intelligence Platform :

  • Architect and implement the cost allocation engine cluster, namespace, deployment, pod, and container granularity across EKS, AKS, and GKE
  • Design and build the real-time data collection pipeline : agent architecture, ClickHouse time-series storage, gRPC streaming between agent and datastore
  • Implement Karpenter integration for node lifecycle management and bin-packing optimization
  • Build custom Kubernetes controllers and operators for cost policy enforcement and automated remediation
  • Design shared cost distribution algorithms system namespaces, control plane costs, networking overhead, idle capacity attribution
  • Integrate CK-Tuner-Kubernetes with CK-Lens for a unified cloud + container cost view

Container Optimization Engine

  • Design and implement container right-sizing algorithms for CPU and memory requests/limits based on real usage patterns
  • Build node pool optimization logic instance type selection, scaling policies, bin-packing efficiency scoring
  • Implement Karpenter-based spot and preemptible node policies for fault-tolerant workloads
  • Build the automated right-sizing execution pipeline via CK-Tuner integration

GPU Container Cost Intelligence

  • GPU utilization tracking and idle GPU detection for AI/ML workloads running on Kubernetes
  • Multi-cluster GPU cost comparison across EKS, AKS, and GKE
  • Integration with the FinOps for AI initiative for GPU pod-level cost attribution

Responsibilities

Technical Leadership :

  • Serve as CK-Tuner-Kubernetes's principal architect and most senior hands-on engineer
  • Set architectural standards and code quality bars; mentor engineers through technical pairing and design reviews
  • Drive technical roadmap and architecture decisions in partnership with Product Management

Hands-On Engineering

  • Write production Go code for CK-Tuner-Kubernetes's core systems : agent data collection, metrics processing, cost allocation engine
  • Design and implement custom Kubernetes controllers and operators
  • Build and optimize the ClickHouse time-series data model for cost metrics at scale
  • Implement gRPC streaming with backpressure, circuit breakers, and mTLS between agent and datastore
  • Develop Karpenter-based node optimization policies and consolidation algorithms
  • Performance-tune the metrics pipeline : 10-second scrape intervals, 1-minute rollups, multi-cluster aggregation

Technical Strategy

  • Design the agent data collection layer hybrid metrics collection via Metrics API, Kubelet Summary, Kubelet Proxy, and optional Prometheus endpoints
  • Architect the ClickHouse time-series schema with materialized views for multi-resolution aggregation (5m, 1h, 1d)
  • Build the delta processing pipeline in-memory state comparison with ring buffers (discovery 10K, metrics 50K, events 100K)
  • Design cost allocation algorithms for shared resources control plane, networking, system namespaces, idle capacity
  • Architect multi-cloud Kubernetes support (EKS primary, AKS/GKE Phase 4) with provider-specific pricing API integrations
  • Build integration points with CK-Lens, CK-Tuner, and CK-Intelligence

Technical Landscape You'll Navigate

Kubernetes & Container Orchestration :

  • Platforms : EKS (Fargate, managed node groups), AKS, GKE (Autopilot, standard), on-prem Kubernetes
  • Ecosystem : OpenCost, Karpenter, Helm, Kubernetes Operators, K8s API Server
  • Resource Management : Requests/limits, node autoscaling, pod scheduling, bin-packing, spot/preemptible nodes
  • Kubernetes Internals : Custom controllers, operators, CRDs, admission webhooks, scheduler plugins, informers, leader election, reconciliation loops

Data Engineering

  • ClickHouse (time-series analytics), Apache Pulsar/NATS JetStream (message broker), gRPC bidirectional streaming with backpressure

Cloud Providers

  • AWS : EKS, Fargate, EC2 (GPU instances), S3, CloudWatch, Cost & Usage Reports
  • Azure : AKS, Azure Monitor, Azure Billing APIs
  • GCP : GKE, GKE Autopilot, BigQuery Billing Export

Role Requirements

Experience :

  • 10+ years in systems/platform/infrastructure engineering with deep hands-on Kubernetes production experience (EKS, AKS, or GKE)
  • Track record of personally designing and implementing complex distributed systems not just overseeing teams that build them
  • Experience building Kubernetes tooling : operators, controllers, CLI tools, or platform products
  • Prior work on cost/resource optimization, observability, or infrastructure intelligence platforms preferred
  • Experience with container orchestration at scale multi-cluster, multi-cloud preferred

Technical Depth

  • Expert-level : Kubernetes internals (scheduler, controller-manager, kubelet, API server), resource management, pod lifecycle
  • Hands-on : Custom controller/operator development using controller-runtime or client-go
  • Production experience with Karpenter, OpenCost, or equivalent node/cost optimization tools
  • Strong Go proficiency (CK-Kube is 100% Go); experience with gRPC, Protocol Buffers
  • ClickHouse or similar OLAP/time-series database experience for high-throughput metrics
  • eBPF, CNI, or CSI plugin development experience is a strong plus

Leadership

  • Ability to operate in a founding engineer mode small team, high ownership, rapid shipping
  • Track record of setting technical direction and architectural standards that scale beyond your own code
  • Comfortable wearing multiple hats : architecture, implementation, code review, technical documentation, product input
  • Influence through technical excellence, design documents, and working code not through organizational authority
  • Strong communicator who can influence across functions and levels

(ref:hirist.tech)

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 144215979