Search by job, company or skills

Larsen & Toubro Limited

Hpc Engineer

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 months ago

Job Description

  • Design, deploy and configure HPC Clusters including compute, storage and networking components.
  • Installation requests on HPC, application upgrades, and troubleshooting processes in coordination with users, software vendors and OEM.
  • Administer job schedulers (e.g., Slurm), manager user access, monitor health and troubleshoot system issues on both on-prem and Cloud.
  • Optimize HPC workloads, tune resource utilization and benchmark system performance.
  • Install and maintain HPC hardware, software stacks, compliers, libraries (e.g., MPI, OPENMP) and custom tools. Configure VM, Storage and servers on cloud.
  • Assist users in optimizing and running applications on the cluster & cloud, including guidance. Ensure System stability through regular updates, proactive monitoring and software/hardware troubleshooting.

Responsibilities

  • Supervise day-to-day support operations for HPC and Cloud team by supporting ticket SLA adherence.
  • Manage support ticket systems, primarily using internal IT tools.
  • Ensure timely resolution of user issues related to CAE applications in HPC & Cloud.
  • Plan, schedule, and oversee application upgrades and installations.
  • Collaborate with internal teams and external vendors to ensure seamless issue resolution.
  • Generate detailed performance reports monthly, analysing key trends and areas for improvement.

Technical Skills:

  • Operating Systems: Expertise in Linux (RHEL CentOS, Ubuntu)
  • HPC Tools and Frameworks:
  • 1. Job Schedulers: Slurm, PBS & Sync-HPC
  • 2. Parallel Programming: MPI, OPENMP, CUDA
  • 3. Scripting: Python, Bash and Optionally C/C++
  • Cloud: Knowledge in AWS, GCP & Azure with HPC toolkits, VM & Object storage creation.
  • Networking: Knowledge of high-speed networks (InfiniBand, RDMA, Ethernet)
  • Storage Systems: Experience with parallel file systems (Lustre, NFS)
  • Hardware: Familiarity with HPC specific hardware wit, RAM, CPU & GPU

Certifications

  • Any Cloud Solution Architect Certificate (Preferred GCP)
  • RHEL Certified System Administrator (Preferred)

More Info

Job Type:
Industry:
Function:
Employment Type:
Open to candidates from:
Indian

Job ID: 104228685