Search by job, company or skills

QI-CAP INVESTMENTS PRIVATE LIMITED

Infrastructure Operations Lead

new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

The Colocation Infrastructure Operations Lead is responsible for end-to-end management (Own and operate) of NSE, BSE, and MCX data center environments, ensuring continuous availability, performance, and compliance for all trading infrastructure. This role oversees onsite and remote dc engineers, manages regular changes, deployments, handles change management, interacts with exchanges, and ensures all systems (Linux-based) operate with ultra-low latency performance, Linux servers, NIC firmware, automation, vendor coordination, monitoring, and trading infrastructure performance. and high reliability. This role requires strong decisions making under pressure at crucial situations and Lead engineers, own deployments, and interface directly with management, operations team, exchanges and traders. Also the candidate must have a sharp understanding of severity classifications, urgency levels, and real-time decision making, with the ability to assess business impact in HFT environments where even every microseconds matter.

Key Roles/Responsibilities

1. Exchange Colocation Operations

  • Manage day-to-day operations across multiple brokers NSE/BSE/MCX/FTTOWER/MBKC colocation racks.
  • Coordinate with onsite engineers for installation, maintenance, audits, and physical tasks.
  • Handle new rack provisioning, power planning, cable management, and device deployment.
  • Maintain strong working relationships with exchange technical teams and facility managers.
  • Bring-up and validate L1/L2 links, multicast feeds, member connectivity, heartbeat checks, flow-control, and latency path auditing.

2. OS Operations

  • Oversee all Linux servers used for trading, with regular checks, benchmarking new servers
  • Expertise in tuning various brands motherboards to achieve low latency
  • Ensure OS hardening, kernel upgrades for low latency, BIOS tuning, clock sync/PTP, CPU isolation, hugepages, BIOS/UEFI tuning,and performance optimization.
  • Own automation playbooks (Ansible/Bash) for deployments, config updates, and monitoring.
  • Deep understanding about kernel bypass handling interrupts

3. NIC Operations (Firmware & Driver Lifecycle Management)

  • Install, upgrade, and validate NIC drivers/firmware (Solarflare/ExaNIC/Mellanox/Intel/Xilinx).
  • Manage PCIe tuning, writing NIC related tuning scripts, adding optimization, ef_vi tuning, TCPDirect/onload
  • Work closely with quant developers to get feedback about the systems to further improve on ultra-low-latency optimization in server end.

4. Monitoring & Troubleshooting

  • Own monitoring dashboards (Zabbix/Prometheus/PRTG) for servers, switches, latency metrics, and colocation racks.
  • Handle L1L3 troubleshooting for hardware, network links, kernel panics, PTP drift, and feed issues.
  • Being always updated with new colo guidelines and with exchange colocation updates
  • Lead root-cause analysis for production incidents, outages, and exchange-side failures.
  • Lead L1-L3 debugging for any infra issues

6. Vendor, Procurement & Inventory

  • Manage procurement for servers, NICs, optics, cables, switches, IPMI, and rack accessories.
  • Work with vendors for RMA, price negotiations, delivery, and AMC/SLAs/Warranty.
  • Maintain accurate inventories of hardware, licenses, firmware versions, additional spares and rack assets.

Preferred Qualifications/Technical Expectations/Skills/Experience Required:-

  • Graduate in B.E. Computer Science (CSE) is preferred.
  • 812+ years in datacentre operations managing low latency trading infrastructure.
  • Prior hands-on experience in NSE/BSE/MCX colocation ecosystems, including rack bring-up, member connectivity, multicast feeds, and exchange coordination.
  • Expertise in overclocked (OC) server optimization, BIOS tuning, thermal/voltage profiling, and core isolation techniques for trading workloads.
  • Proven experience managing and tuning HFT-grade servers including kernel bypass stacks, low-latency drivers, ef_vi, TCPDirect/Onload, and Solarflare/ExaNIC/Xilinx technology stacks.
  • Experience with X3/X4/SFC/ExaNIC/Mellanox/Intel/Broadcom NICs, firmware, and tuning.
  • Deep knowledge of Linux OS internals (NUMA, cgroups, threading models, nvme drives, jobs scheduling, network stacks, PTP clocking, IRQ affinity, kernel modules).
  • Deep understanding of HFT/low-latency environments, colocation constraints, trading rack setup, architecture and exchange rules.
  • Knowledge of networking: L1/L2 concepts, multicast, BGP basics, VLANs, Port channel,IP SLA, PTP.
  • Expertise in OC machines Bios tuning
  • Prior experience with NSE/BSE/MCX colocation.
  • Experience working in low-latency HFT firms.
  • Strong troubleshooting ability for hardware, kernel, and network-level issues.
  • Excellent coordination skills across vendors, exchanges, and internal teams.
  • Ability to provide constant feedback about infra/systems/monitoring/processes to further streamline things.
  • Experience handling DC procurement, vendor management, RMA cycles, AMC/SLA agreements, and cost optimization for hardware + networking components.
  • Good understanding of OSI L1/L2 networking, multicast routing, BGP basics, VLANs, LAG/Port Channels, MTU, link-level flow control, optics, transceivers, and DC cabling standards.
  • Ability to collaborate with quant teams, trading desk, risk, and business stakeholders to support latency and reliability goals.
  • Experience leading 24x7 high availability trading operations.
  • Strong incident + outage management skills (war room, RCA, PIR reports, dashboards).
  • Cybersecurity awareness for trading infra SSH hardening, network segmentation, log forensics.
  • Has led L1/L2/L3 engineering teams in fast-paced trading environments.
  • Strong communication skills with ability to translate technical situations to business impact.
  • Decision-making under pressure, especially during market hours and exchange incidents.
  • Strong expertise in Linux (Ubuntu/RHEL) systems administration.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 144377501