Search by job, company or skills

H

HPC Linux System Administrator

4-8 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 13 days ago
  • Over 50 applicants
Quick Apply

Job Description

As an HPC Systems Engineer, you will manage and maintain high-performance computing (HPC) clusters, ensuring optimal performance, stability, and availability. You will work hands-on with Linux/Unix systems, storage, networking, and automation tools to support development, testing, and release validation in HPC environments. This role also involves mentoring junior team members and collaborating across geographies to deliver reliable HPC infrastructure.

Key Responsibilities:

HPC Cluster Management

  • Manage, maintain, and optimize HPC clusters, including compute, storage, and management nodes.
  • Install, configure, and update Linux/Unix-based systems for high availability and performance.
  • Oversee lab systems used for software development, testing, and release validation.
  • Apply OS, firmware, and security updates to maintain system stability and compliance.

System Administration & Troubleshooting

  • Administer Linux/Unix systems, ensuring security, performance, and reliability.
  • Conduct hardware troubleshooting and coordinate with vendors or internal teams for repairs/replacements.
  • Monitor system performance, perform health checks, and implement preventive maintenance measures.
  • Manage storage systems (NFS, Lustre, GPFS, RAID) and ensure efficient data flow across the HPC environment.

Automation & Optimization

  • Develop and maintain automation scripts using Bash, Python, or Ansible to improve operational efficiency.
  • Perform system imaging, software provisioning, and configuration management.
  • Implement solutions to automate routine tasks and optimize HPC workflows.

Documentation & Collaboration

  • Maintain system documentation, including configuration details, maintenance procedures, and troubleshooting guides.
  • Collaborate with cross-functional teams to resolve issues, plan upgrades, and support HPC project activities.
  • Provide guidance and mentoring to less-experienced staff members.

What You Need to Bring:

Education and Experience:

  • Bachelor's or Master's degree in Computer Science, Information Systems, or related field.
  • 48 years of hands-on experience in Linux/Unix administration and HPC cluster management.

Knowledge and Skills:

  • Strong proficiency in Linux/Unix administration (installation, configuration, tuning, troubleshooting).
  • Experience managing HPC clusters (e.g., HPE Cray, Slurm, PBS, LSF).
  • Solid understanding of networking fundamentals (TCP/IP, DNS, DHCP, VLANs).
  • Experience with storage management systems such as NFS, Lustre, or GPFS.
  • Hands-on experience in hardware diagnostics and maintenance.
  • Familiarity with system monitoring tools (Prometheus, Grafana, Nagios).
  • Working knowledge of containerization (Docker, Singularity) and virtualization technologies is a plus.
  • Proficiency in shell scripting (Bash).
  • Familiarity with Python or Ansible for automation and orchestration.
  • Strong troubleshooting, analytical, and problem-solving skills with a focus on root cause analysis.
  • Experience maintaining accurate system documentation and change logs.

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

About Company

The Hewlett-Packard Company, commonly shortened to Hewlett-Packard or HP, was an American multinational information technology company headquartered in Palo Alto, California. HP developed and provided a wide variety of hardware components, as well as software and related services to consumers, small and medium-sized businesses (SMBs), and large enterprises, including customers in the government, health, and education sectors. The company was founded in a one-car garage in Palo Alto by Bill Hewlett and David Packard in 1939, and initially produced a line of electronic test and measurement equipment. The HP Garage at 367 Addison Avenue is now designated an official California Historical Landmark, and is marked with a plaque calling it the "Birthplace of 'Silicon Valley'".

Job ID: 139867533