Data Center Operations Engineer
Location: Hyderabad
12 Months Contract
The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems, GPU server deployments, and InfiniBand networking. This role requires hands-on expertise in data center operations, cluster bring-up, hardware installation, and troubleshooting across compute, network, and GPU environments. The engineer will collaborate closely with global infrastructure, development, and operations teams to ensure reliable, secure, and scalable service delivery.
Required Qualifications
- Bachelor's degree in Computer Science, Engineering, Information Technology, or equivalent practical experience.
- Strong hands-on experience in Linux environments, including system administration, troubleshooting, and performance validation.
- Proficiency with Linux command-line tools and shell scripting (Bash or equivalent).
- Experience with cluster bring-up, driver installation, and system-level configuration.
- Hands-on experience setting up and validating GPU servers in clustered environments.
- Experience with end-to-end GPU testing in InfiniBand-based clusters.
- Working knowledge of InfiniBand networking, including switch configuration and subnet management.
- Solid understanding of networking fundamentals, including the OSI model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP, SMTP, FTP, TFTP).
- Experience installing, configuring, and troubleshooting routers, switches, and terminal servers.
- Familiarity with fiber and copper cabling, including IP and SAN deployments.
- Experience managing incident tickets, maintaining acceptable ticket loads, and meeting SLAs.
- Strong organizational skills with meticulous attention to detail in data center environments.
- Ability to follow and enforce documented escalation procedures and operational policies.
- Strong verbal and written communication skills, with the ability to collaborate effectively with cross-functional and global teams.
Preferred Qualifications
- Experience supporting HPC, AI, or large-scale GPU environments.
- Exposure to data center monitoring
- Experience documenting operational processes and maintaining technical runbooks.
- Familiarity with large-scale data center buildouts or refresh programs.
Physical Requirements
- Ability to perform the essential functions of the role, including lifting, moving, and installing equipment weighing 50 pounds or more, with or without reasonable accommodation.
- Ability to work in data center environments, including raised floors, equipment racks, and confined spaces.
- Willingness to work flexible hours, including nights, weekends, and on-call rotations as required.
Work Environment
- On-site data center environment with occasional remote coordination.
- Interaction with hardware vendors, service providers, and internal engineering teams.
- Fast-paced operational setting requiring attention to detail, adherence to safety standards, and rapid problem resolution.