InfiniBand Engineer (High-Performance Networking)

aptly technology corporation

India

5-7 Years

This job is no longer accepting applications

Posted a month ago

Job Description

Job Summary

We are seeking a highly skilled InfiniBand Engineer with strong expertise in advanced networking technologies to design, deploy, and support high-performance, low-latency network infrastructures. The ideal candidate will have hands-on experience with InfiniBand fabrics, data center networking, and large-scale distributed computing environments (HPC / AI / ML clusters).

Key Responsibilities

Design, implement, and manage large-scale InfiniBand (IB) fabrics in data center and HPC environments.
Configure and troubleshoot InfiniBand switches and adapters (e.g., Mellanox / NVIDIA IB platforms).
Perform fabric bring-up, subnet management (OpenSM), partitioning, and performance tuning.
Monitor and optimize network performance, latency, throughput, and congestion control.
Integrate InfiniBand with Ethernet-based networking environments.
Support RDMA technologies (RoCE, iWARP) and GPUDirect environments.
Collaborate with system, storage, and compute teams to support AI/ML and distributed workloads.
Perform firmware upgrades, patching, and capacity planning.
Troubleshoot Layer 2 / Layer 3 networking issues (BGP, OSPF, VLAN, VXLAN, etc.).
Maintain documentation, network diagrams, and SOPs.

Required Skills & Qualifications

5+ years of networking experience with strong fundamentals (TCP/IP, routing, switching).
Hands-on experience with InfiniBand technologies (HDR/NDR preferred).
Experience with NVIDIA / Mellanox Technologies switches and adapters.
Strong understanding of RDMA, congestion control, QoS, and low-latency tuning.
Experience with subnet managers (OpenSM) and fabric diagnostic tools.
Solid understanding of BGP, OSPF, EVPN-VXLAN, MPLS (good to have).
Experience in HPC, AI/ML cluster networking environments is highly preferred.
Familiarity with Linux networking and troubleshooting tools.
Experience with automation (Python, Ansible) is a plus.

Preferred Qualifications