Linux System Administrator

SISL Global

Chennai, India

Fresher

Save

Posted 2 days ago
Be among the first 30 applicants

Early Applicant

Job Description

Job Description HPC Engineer (HPC with SLURM, CPU & GPU Clusters)

Position Overview

We are seeking a skilled HPC Engineer to design, deploy, manage, and optimize our on premises High Performance Computing (HPC) environment, consisting of SLURM-managed CPU and GPU clusters. The ideal candidate will have a strong understanding of HPC architecture, Linux systems, job scheduling, and cluster operations. Experience with parallel file systems and enterprise storage solutions such as WekaFS or Scality is preferred but optional.

Key Responsibilities

1. HPC Infrastructure & Operations

Manage day to day operations of on prem HPC clusters including CPU and GPU compute nodes.

Monitor cluster health, performance, and utilization, ensuring high availability and efficiency.

Implement and maintain best practices for HPC operations, user management, and resource administration.

Troubleshoot cluster related issues including networking, node failures, job failures, and performance bottlenecks.

Support users in job submissions, resource usage, and HPC workflows.

2. SLURM Workload Manager (Mandatory)

Configure, install, and manage SLURM workload manager across multiple clusters.

Handle queue creation, partition configuration, node allocation, fair share policies, and job prioritization.

Perform SLURM upgrades, migrations, and service maintenance with hands on expertise.

Work with SLURM APIs and integrations to support automation and custom workflows.

Optimize scheduling policies for mixed CPU/GPU workloads.

3. Linux System Administration

Manage Linux-based compute nodes, head nodes, and administration servers.

Perform OS updates, package installations, security patching, and system tuning.

Knowledge of shell scripting (Bash/Python) for automation and HPC tooling workflows.

4. Parallel Computing & Cluster Architecture

Understanding of parallel computing concepts: MPI, OpenMP, distributed execution.

Familiarity with HPC building blocks: interconnect networks (InfiniBand/100G), storage tiers, resource managers, monitoring tools.

Ability to analyze and troubleshoot performance issues in parallel workloads.

5. Storage (Optional but Preferred)

A. WEKA (WekaFS) Optional

Knowledge of parallel file systems and performance tuning.

Diagnose and resolve issues related to WekaFS with minimal downtime.

Provide guidance to internal teams on WekaFS usage and best practices.

Stay updated with Weka ecosystem advancements and propose improvements.

B. Scality Optional

Troubleshoot and maintain Scality RING and ARTESCA environments.

Monitor, tune, and optimize Scality-based storage for high availability and reliability.

Create and maintain documentation for Scality configuration and SOPs.

Recommend performance improvements based on new Scality enhancements.

Qualifications & Skills

Mandatory Skills

Experience managing HPC clusters with SLURM in production environments.

Good understanding of Linux (RHEL) administration.

Knowledge of parallel computing concepts and HPC architecture.

Strong troubleshooting and diagnostic skills.

Ability to work in complex, multi-node distributed environments.

Preferred/Optional Skills

Experience with WekaFS, Scality RING, or other parallel/distributed file systems.

Exposure to GPU computing (CUDA, NVIDIA drivers, GPU scheduling).

Familiarity with monitoring tools (Grafana, Prometheus).

More Info

Job Type:

Permanent Job

Industry:

Other

Function:

High Performance Computing

Employment Type:

Full time

About Company

SISL GlobalJob Source: www.linkedin.com

Job ID: 143248023

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 22-02-2026 07:09:39 PM

Homejobs in ChennaiLinux System Administrator

Do you want to see more relevant and perfect job for you?

Beware of Scammers

We don’t charge any money for job offers

What it feels like to have

48% more interview calls?

To get 5X more recruiter views on your profile