The Opportunity
We are seeking a Platform Specialist (Director level) to serve as the organization's top technical authority on Kubernetes and the most senior hands-on engineer for CK-Kube, our Kubernetes Cost Intelligence platform. This is a deep individual contributor role 60%+ hands-on engineering where you will architect, implement, and technically lead CK-Kube as the principal engineer. You will set the technical direction, write production code, and drive architectural decisions. We are not looking for a people manager; we are looking for the strongest Kubernetes systems engineer we can find.
What You'll Own
CK-Tuner-Kubernetes Kubernetes Cost Intelligence Platform :
- Architect and implement the cost allocation engine cluster, namespace, deployment, pod, and container granularity across EKS, AKS, and GKE
- Design and build the real-time data collection pipeline : agent architecture, ClickHouse time-series storage, gRPC streaming between agent and datastore
- Implement Karpenter integration for node lifecycle management and bin-packing optimization
- Build custom Kubernetes controllers and operators for cost policy enforcement and automated remediation
- Design shared cost distribution algorithms system namespaces, control plane costs, networking overhead, idle capacity attribution
- Integrate CK-Tuner-Kubernetes with CK-Lens for a unified cloud + container cost view
Container Optimization Engine
- Design and implement container right-sizing algorithms for CPU and memory requests/limits based on real usage patterns
- Build node pool optimization logic instance type selection, scaling policies, bin-packing efficiency scoring
- Implement Karpenter-based spot and preemptible node policies for fault-tolerant workloads
- Build the automated right-sizing execution pipeline via CK-Tuner integration
GPU Container Cost Intelligence
- GPU utilization tracking and idle GPU detection for AI/ML workloads running on Kubernetes
- Multi-cluster GPU cost comparison across EKS, AKS, and GKE
- Integration with the FinOps for AI initiative for GPU pod-level cost attribution
Responsibilities
Technical Leadership :
- Serve as CK-Tuner-Kubernetes's principal architect and most senior hands-on engineer
- Set architectural standards and code quality bars; mentor engineers through technical pairing and design reviews
- Drive technical roadmap and architecture decisions in partnership with Product Management
Hands-On Engineering
- Write production Go code for CK-Tuner-Kubernetes's core systems : agent data collection, metrics processing, cost allocation engine
- Design and implement custom Kubernetes controllers and operators
- Build and optimize the ClickHouse time-series data model for cost metrics at scale
- Implement gRPC streaming with backpressure, circuit breakers, and mTLS between agent and datastore
- Develop Karpenter-based node optimization policies and consolidation algorithms
- Performance-tune the metrics pipeline : 10-second scrape intervals, 1-minute rollups, multi-cluster aggregation
Technical Strategy
- Design the agent data collection layer hybrid metrics collection via Metrics API, Kubelet Summary, Kubelet Proxy, and optional Prometheus endpoints
- Architect the ClickHouse time-series schema with materialized views for multi-resolution aggregation (5m, 1h, 1d)
- Build the delta processing pipeline in-memory state comparison with ring buffers (discovery 10K, metrics 50K, events 100K)
- Design cost allocation algorithms for shared resources control plane, networking, system namespaces, idle capacity
- Architect multi-cloud Kubernetes support (EKS primary, AKS/GKE Phase 4) with provider-specific pricing API integrations
- Build integration points with CK-Lens, CK-Tuner, and CK-Intelligence
Technical Landscape You'll Navigate
Kubernetes & Container Orchestration :
- Platforms : EKS (Fargate, managed node groups), AKS, GKE (Autopilot, standard), on-prem Kubernetes
- Ecosystem : OpenCost, Karpenter, Helm, Kubernetes Operators, K8s API Server
- Resource Management : Requests/limits, node autoscaling, pod scheduling, bin-packing, spot/preemptible nodes
- Kubernetes Internals : Custom controllers, operators, CRDs, admission webhooks, scheduler plugins, informers, leader election, reconciliation loops
Data Engineering
- ClickHouse (time-series analytics), Apache Pulsar/NATS JetStream (message broker), gRPC bidirectional streaming with backpressure
Cloud Providers
- AWS : EKS, Fargate, EC2 (GPU instances), S3, CloudWatch, Cost & Usage Reports
- Azure : AKS, Azure Monitor, Azure Billing APIs
- GCP : GKE, GKE Autopilot, BigQuery Billing Export
Role Requirements
Experience :
- 10+ years in systems/platform/infrastructure engineering with deep hands-on Kubernetes production experience (EKS, AKS, or GKE)
- Track record of personally designing and implementing complex distributed systems not just overseeing teams that build them
- Experience building Kubernetes tooling : operators, controllers, CLI tools, or platform products
- Prior work on cost/resource optimization, observability, or infrastructure intelligence platforms preferred
- Experience with container orchestration at scale multi-cluster, multi-cloud preferred
Technical Depth
- Expert-level : Kubernetes internals (scheduler, controller-manager, kubelet, API server), resource management, pod lifecycle
- Hands-on : Custom controller/operator development using controller-runtime or client-go
- Production experience with Karpenter, OpenCost, or equivalent node/cost optimization tools
- Strong Go proficiency (CK-Kube is 100% Go); experience with gRPC, Protocol Buffers
- ClickHouse or similar OLAP/time-series database experience for high-throughput metrics
- eBPF, CNI, or CSI plugin development experience is a strong plus
Leadership
- Ability to operate in a founding engineer mode small team, high ownership, rapid shipping
- Track record of setting technical direction and architectural standards that scale beyond your own code
- Comfortable wearing multiple hats : architecture, implementation, code review, technical documentation, product input
- Influence through technical excellence, design documents, and working code not through organizational authority
- Strong communicator who can influence across functions and levels
(ref:hirist.tech)