Search by job, company or skills

H

Product Manager - AI Data Center Infrastructure

5-10 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago
  • Over 50 applicants
Quick Apply

Job Description

  • Define and own product strategy and requirements for next-generation AI data center networking platforms supporting large-scale GPU clusters.
  • Drive architecture for scale-up and scale-out AI fabrics delivering deterministic performance, ultra-low latency, and high bandwidth efficiency.
  • Translate AI workload characteristics into clear product roadmaps for switching, routing, automation, and telemetry capabilities.
  • Specify Ethernet fabric designs using Juniper QFX platforms and Apstra-based intent-driven automation.
  • Lead requirements for GPU, NIC, and interconnect technologies aligned with NVIDIA and AMD ecosystems.
  • Define interoperability requirements across switches, NICs, DAC, AEC, ACC, and optical transceivers.
  • Specify L2/L3 architectures including EVPN-VXLAN, Class-E IPv4, and AI-optimized buffer and congestion management.
  • Drive performance optimization by analyzing GPU job behavior to identify congestion, packet loss, and microbursts.
  • Define and tune ECN, RDMA/ROCEv2, PFC, and traffic engineering policies for AI training and inference workloads.
  • Lead validation, scale testing, and certification of new switch software, optics, NIC firmware, and GPU platforms.
  • Participate in root-cause analysis for link stability, FEC/PCS errors, and power or thermal-related performance issues.
  • Work independently while driving alignment across engineering, operations, platform teams, and strategic ecosystem partners.

Required Qualifications

  • 510+ years of experience in data center networking, AI infrastructure, or HPC environments.
  • Strong hands-on experience with Juniper QFX platforms and JunOS.
  • Deep understanding of GPU architectures including NVIDIA H100/H200, GB200/GB300, NVLink/NVSwitch, and AMD MI300/MI400 with Infinity Fabric.
  • Proven expertise in scale-up GPU interconnects and scale-out Ethernet fabrics.
  • Strong knowledge of RDMA/ROCEv2, ECN, PFC, buffer management, and congestion control.
  • Hands-on experience troubleshooting high-speed optics, AEC/ACC cables, link training, and NIC firmware.
  • Proficiency in automation and scripting using Python, Ansible, Bash, and Terraform.
  • Familiarity with distributed AI workloads and collective communication libraries such as NCCL and RCCL.

Preferred Qualifications

  • Industry certifications such as JNCIE, CCIE, NCP-AII, NCA-AIIO, NCP-AIO, or NCP-AIN.
  • Experience with Apstra or other intent-based networking platforms.
  • Knowledge of 1.6T optics, 200G PAM4 SerDes, and CPO/LPO architectures.
  • Experience supporting liquid-cooled GPU clusters and rack-level power and network design.
  • Understanding of data center operations, observability, and SLA-driven AI infrastructure environments.

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

About Company

The Hewlett-Packard Company, commonly shortened to Hewlett-Packard or HP, was an American multinational information technology company headquartered in Palo Alto, California. HP developed and provided a wide variety of hardware components, as well as software and related services to consumers, small and medium-sized businesses (SMBs), and large enterprises, including customers in the government, health, and education sectors. The company was founded in a one-car garage in Palo Alto by Bill Hewlett and David Packard in 1939, and initially produced a line of electronic test and measurement equipment. The HP Garage at 367 Addison Avenue is now designated an official California Historical Landmark, and is marked with a plaque calling it the "Birthplace of 'Silicon Valley'".

Job ID: 139683853