Position Overview
Looking for a network engineer with experience in datacenter environments and at least light
programming experience.
SONiC Programmatic Iterative Configuration (gnmi/yang, Swss) Experience Required
SONiC base configuration (L2, mclag, lag/portchannel, bgp, bfd, etc) experience preferred
FRR Experience Preferred (OSPF)
OpenGear experience nice to have
Light systems programming language (C, C++, Golang, Rust, etc) experience preferred, stronger
experience nice to have
Scripting Language (Python, Bash, Etc) Experience Required
Linux administration (Bash, systemd units, general system navigation) experience preferred
Virtual Networking (VXLAN) Experience Preferred
AWS networking (VPC, Direct Connect) nice to have
Task Expectations
- Programmatic Iterative Configuration of SONiC Switches (yang/gnmi, swss, etc.)
Expected Experience/abilities:
Has used the above previously to configure, or can trivially identify how to implement CRUD
operations (or at least CRD) against constructs such as but not limited to:
Physical and sub interfaces
VXLAN/VNI
VRF
ACL
At minimum must provide correctly functioning examples
Functionality will ultimately be written in Golang. A network engineer is merely expected to
identify, document, and demonstrate interface functionality for the SingleStore team to
implement.
Network engineer being able to implement the CRUD/CRD functionality as a
library/module in Golang would be a plus but is unexpected
Actual virtual networking control plane implementation is expected to be responsibility
of SingleStore team
Network engineer contributing here would be high value, freeing up team to focus on
storage implementation
- Base Configuration of SONiC Switches
Inband Configuration:
L2, mclag, lag/portchannel, bgp, bfd
Need to set up anycast addresses for metadata service IP, SAG, etc
Bifurcated Spine-Leaf Topology (Inband):
Each side of aisle has 2x spines to be mclag d
Each spine has 2x connections to each other spine to be lag d
Each spine has 2x connections to each ToR/leaf on same side to be lag d
Each side of aisle has private AS
ToR-compute node connections breakouts, ToR-storage nodes standard
Spine-Leaf Topology (Out Of Band):
Each side of aisle has 1x spine
Each spine has connection to each ToR/leaf on both sides
Currently each side of aisle has private AS
Can be argued should be single AS
Currently hardcoded L3
Additional Notes:
Spine model in use insufficient resources for unified DHCP stack - had to settle on model due to
tariff season (isc dhcpd usable)
Server BMCs previously static IP d and/or infinite lease d via DHCP by vendor, require crash
carting/manual full reset in order to DHCP
Switch management ports physically connected but not currently configured to be reachable via
OOB network
PDU management ports do not DHCP, require on-site troubleshooting to bring into network
Coordination:
Coordinate with DevOps on switch integration with monitoring
Transition from ad-hoc to code-driven base configuration
Coordinate with DevOps on switch provisioning (ZTP or otherwise)
Coordinate with DevOps on SONiC build pipeline
- Palo Alto Firewall Configuration Remediation
Transition from ad-hoc to code-driven configuration
Multipath & Traffic Handling:
Ensure multipath functioning correctly
Firewall rules engine appears to favor single source interface for all src/dst resulting in
erroneous packet drops
Ensure upstream egress/ingress A/P functioning correctly
Will need to work with network team of colocation vendor providing IP transit to
remedy IP transit only having one functioning leg at present
May require on-site work/coordination
Direct Connect:
Ensure direct connect multipath correctly working
Ensure no overly eager security features negatively impacting legitimate traffic (session
drops/throttling, unreasonable latency impacts currently 200ms hit on some traffic)
Ensure no unlicensed security features enabled (dnssec currently erroneously enabled)
Additional Configuration:
NAT public IPs for use
Interzone traffic rules currently permissive more mature tiered scheme necessary for long term
Coordinate with DevOps on firewall integration with monitoring
Physical Topology:
2x OpenGear OM2224 spines
16x OpenGear IM7248 ToRs
Current State:
Spines routable, providing loop for firewall management ports
Cellular not active
ToRs lack ethernet routing
All end-device access currently through nested console sessions
Requirements:
FRR experience required, OpenGear experience bonus
Challenges & Fixes:
OpenGear cellular fallback does not work well with multipath (destroys routing when triggered)
Should be manually implemented using systemd timer with heartbeats over multiple
paths
Setup Tasks:
Set up direct-to-end-device serial console via SSH
Configure standardized versions for IM7248s and OM2224s
Set up standardized credentials
Cellular Requirements:
Business cellular plan required for OM2224s
Minimum 10GB/month, 50GB+ preferred
Needed for emergency access and recovery scenarios
Constraints:
AT&T allows OpenGears but blocks Palo Alto traffic on consumer plans
Verizon 4G coverage inconsistent in colo area
One T-Mobile 4G band unsupported by OM2224 modem
Coordination:
Work with colocation vendor for antenna extension installation on roof to ensure reliable cellula