Search by job, company or skills

T

HPC Engineer

5-10 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant
Quick Apply

Job Description

Job Location: Singapore. (Onsite) 

Job Summary

We are seeking a skilled High-Performance Computing (HPC) Engineer with 5–10 years of experience to design, deploy, manage, and optimize HPC cluster environments. The ideal candidate will have hands-on experience with cluster scheduling, monitoring, performance tuning, and supporting scientific or engineering workloads in Linux-based environments.

Key Responsibilities

·       Design, deploy, and maintain HPC cluster infrastructure to ensure high availability and performance.

·       Manage and configure job scheduling systems such as PBS and SLURM.

·       Implement and maintain monitoring solutions using Grafana, Nagios, Prometheus, and Ganglia.

·       Administer cluster management tools including Bright Cluster Manager, xCAT, and Puppet for infrastructure automation.

·       Configure and troubleshoot high-speed networking technologies including InfiniBand and Gigabit Ethernet.

·       Perform system performance analysis, profiling, and debugging using tools like Intel VTune, Valgrind, and gprof.

·       Provide application support for scientific and engineering workloads using GNU and Intel CUDA compilers, as well as MKL libraries.

·       Manage virtualization environments using Proxmox and handle license management tools like FlexLM.

·       Configure and maintain storage solutions including parallel file systems and enterprise object storage platforms.

·       Ensure system security, patching, and compliance in Red Hat Linux environments.

·       Collaborate with research, engineering, and IT teams to optimize workloads and resource utilization.

·       Document system architecture, processes, and troubleshooting guides.

 Required Skills & Qualifications

·       5–10 years of experience in HPC systems administration or engineering.

·       Strong experience with job schedulers such as PBS and SLURM.

·       Hands-on experience with monitoring tools: Grafana, Nagios, Prometheus, Ganglia.

·       Expertise in cluster management tools like Bright Cluster Manager, xCAT, and Puppet.

·       Solid understanding of HPC networking, including InfiniBand and Ethernet.

·       Experience with performance profiling and debugging tools (Intel VTune, Valgrind, gprof).

·       Familiarity with compilers and libraries: GNU, Intel CUDA, MKL.

·       Experience with virtualization platforms like Proxmox and license management (FlexLM).

·       Knowledge of storage technologies: parallel file systems (e.g., Lustre, GPFS) and object storage.

·       Strong Linux administration skills, specifically Red Hat Enterprise Linux.

·       Scripting skills (Bash, Python, or similar) for automation and troubleshooting.

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

About Company

TekWissen’s, Staffing division is a recruitment-centric organization focused on providing talent acquisition services (both IT and non-IT) in the Technology, Engineering, Clinical, Legal, Scientific, Finance, Marketing, Professional and Payroll Management arenas to clients across the US, and India. Founded in 2009, TekWissen is one of the fastest growing Staffing firm in United States. We have been recognized by Inc. 5000 fastest growing companies in USA with ranking # 192, #15 Top IT Service Company in 2014 by Inc.com, #6 Top Michigan Companies in 2014 again by Inc.com, Michigan 50 Companies to Watch in 2014, FastTrack Award for 2014.

Job ID: 147033769