Meet The Team
The team specialises in delivering high-impact Proof of Concepts (PoCs) for AI-driven and massively scalable data-centre solutions. We work directly with customers to understand sophisticated requirements and architectures, taking full ownership of system configuration and end-to-end PoC execution. In these engagements, we rigorously analyse any convergence related test case that doesn'tencouragedpected result drilling deep into protocols, system behaviour, and configuration to identify and resolve performance gaps. This enables us to deliver actionable optimisation insights, fine-tuning recommendations build mentorship guidance that achieve lower convergence, low laten outstanding superior performance. We perform root-cause analysis to ensure reliability and repeatable results, and collaborate closely with development engineering teams to supervise and resolve improvement improvementt requests, ensuring production-ready outcomes.
Your Impact
- Work with cutting-edge 400G/800G Ethernet fabrics used in AI and GPU data centers.
- Gain exposure to IP routed and VxLAN fabric and the full NVMeoF stack, from Linux kernel tuning to RoCEv2 fabric automation.
- Hands-on lab environment involving both software-based and hardware-based testing.
- Collaborate with senior architects to compose next-generation lossless Ethernet builds for AI workloads
- Configure and validate Linux/Ubuntu servers for RoCEv2 and NVMeoF traffic generation using 400G NICs.
- Perform system-level tuning including kernel parameters, sysctl, IRQ affinity, NUMA, hugepages, and NIC offloads to optimize RDMA performance.
- Execute NVMeoF traffic tests using software tools (e.g.,IB Perf) and hardware traffic generators (Spirent).
- Capture and analyze fabric metrics such as latency, throughput, packet loss, and congestion response.
- Collaborate with infrastructure teams to ensure lossless Ethernet behavior using PFC and ECN.
- Configure and validate BGP EVPN/VxLAN-based fabrics for IP-routed data center connectivity.
- Troubleshoot Layer 2/3 networking issues, RoCEv2 packet drops, and flow control problems.
- Collect and analyze NIC counters, RDMA statistics, and telemetry data for performance troubleshooting
- Create and maintain detailed test plans, automation documentation, and configuration templates.
- Generate comprehensive test reports and performance summaries for engineering and management reviews.
- Collaborate cross-functionally with hardware, software, and AI platform teams to enhance fabric performance and reliability.
Minimum Qualifications
- Solid understanding of RDMA, RoCEv2, and NVMe over IP Fabrics (RDMA) architectures.
- Strong hands-on experience with Linux/Ubuntu systems administration and tuning.
- Proficiency in Python scripting and automation frameworks (pytest, paramiko, REST APIs, YAML, Ansible).
- Experience configuring PFC and ECN parameters on Linux servers and network switches for RoCEv2.
- Familiarity with traffic generation tools, and commercial hardware tools (Ixia, Spirent).
- Deep knowledge of network protocols including IP, BGP, VxLAN, TCP congestion control, and QoS.
- Strong debugging skills using Linux utilities (ethtool, tc, sysctl, ibv_devinfo, rdma link, nvme list).
- Experience tuning 400G NICs (MTU, SR-IOV, offloads, NUMA pinning, IRQ balancing).
Preferred Skills
- Understanding of network telemetry and performance monitoring tools (SNMP, gNMI, Prometheus).
- Exposure to AI or GPU cluster interconnect testing (InfiniBand or RoCE).
- Familiarity with Cisco Nexus 9K
- Working knowledge of automation tools like Ansible or Jenkins.
- Experience with SPDK/DPDK and user-space NVMeoF implementations
- Bachelor's or Master's degree in Computer Engineering, Electrical Engineering, Computer Science, or equivalent practical experience in network test and automation environments.
Why Cisco
At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.
Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.
We are Cisco, and our power starts with you.