Search by job, company or skills

Deep BI

Principal Kubernetes Networking Engineer, Calico SME

Fresher
Save
new job description bg glownew job description bg glow
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Location: Remote, US working hours

Type: Full-time employee or long-term contractor

Reports to: Engineering Lead, Platform & Database Operations

Department: Engineering, 24/7 on-call rotation

About Deep.BI

Deep.BI is an independent engineering and support company with a deep passion for open-source technologies. We specialize in analytical databases, scalable big data platforms, Agentic AI systems and Kubernetes-powered infrastructure. Our engineers help enterprises build and scale; we support mission-critical data platforms using technologies such as Apache Druid, StarRocks, Apache Flink, Kafka, and Cassandra on cloud, bare metal and hybrid environments. Our customers include Fortune 500 companies across finance, telecom, IT infrastructure and pharma industries.

About the role

Enterprise-scale Apache Druid, StarRocks, and Apache Flink are increasingly run on Calico-based Kubernetes in production; we are expanding our Kubernetes networking practice. This role spans Calico in-depth across customer engagements, Kubernetes networking more broadly (service networking, ingress and egress, multi-cluster, encrypted overlays) and the networking layer of the distributed data platforms (Druid, StarRocks, Flink, Kafka, Cassandra) running on top.

What we're looking for

A senior site reliability engineer with multiple years of experience hosting large scale Kubernetes installations and network fabrics. To succeed in this role, you need to possess deep, intimate knowledge of Kubernetes and Calico internals. You should be well acquainted with concepts and technologies such as eBPF, BGP, Network Policies, iptables, nftables, Linux kernel know-how and low-level networking concepts, multi-cluster connectivity and multi-region cloud deployments.

 

What you'll work on

  • Calico architecture, tuning, upgrades, and incident response across customer environments.
  • BGP design, IP pool planning, NetworkPolicy strategy, eBPF dataplane decisions, Tigera Calico and Enterprise versus open-source tradeoffs.
  • Kubernetes networking design and troubleshooting on AWS and on bare metal. 
  • Diagnosing live production issues end-to-end, from pod interface to underlay.
  • On-call escalations for production Calico incidents at Fortune 500 scale.
  • The internal Calico playbook: reference architectures, runbooks, training materials and reviewing how Calico is deployed and operated on customer clusters.
  • Lead Calico capability building across the engineering team. Represent best practices in customer design reviews and architecture conversations.

Must haves

  • Comfort troubleshooting, maintaining, and evolving large-scale Kubernetes clusters and Calico networking fabrics spanning thousands of cluster nodes in mission-critical production environments.
  • Several years of production Calico operations, including BGP peering, eBPF dataplane, NetworkPolicy at scale, and upgrade or migration paths between Calico versions and modes.
  • Strong Kubernetes operator across kube-proxy modes, ingress and egress patterns, DNS, service networking, multi-cluster connectivity, cluster lifecycle.
  • Advanced Linux networking: TCP/IP, iptables/nftables, netfilter, conntrack, network namespaces, tcpdump, ss, ip, eBPF tooling. 
  • Ability to read and understand a packet capture and explain what the kernel is doing.
  • Hands-on with at least one major cloud (AWS strongly preferred) and bare metal. 
  • Have shipped Calico on EKS AND on a hardware fleet with leaf-spine and BGP to top-of-rack.
  • Comfortable in front of senior engineering and SRE audiences.
  • Calm demeanor during production incidents.
  • Demonstrated ability to document incidents clearly, concisely and quickly.

Strongly preferred

  • Experience with Cilium-based networking. 
  • Tigera Calico Enterprise or Calico Cloud experience, including operator, observability and policy tooling.
  • Multi-cluster networking patterns: Calico cluster mesh, Submariner, federated policy.
  • CKA, CKS or Calico / Cilium / Isovalent certifications.

Nice to have

  • Contributions to Calico, Cilium, or related CNI projects.
  • GPU networking and AI infrastructure background. RDMA, RoCE, Nvidia networking stacks, GPU-aware scheduling and topology. 
  • Data center networking depth: leaf-spine, BGP underlay, MTU and jumbo frames, ECMP.

Why work with Deep.BI

  • Competitive compensation.
  • Fully remote position with working hours overlapping US business hours,
  • Compensated on-call rotation.
  • Support in-house tech teams and centers of excellence from some of the most prestigious global companies in the tech industry.
  • Join an international team of technology experts and work on a variety of diverse subject matters.
  • Access to unique opportunities that will put you at the forefront of these rapidly advancing technologies. 

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 147486919