Search by job, company or skills

I

HPE Engineer Permanent WFH(REMOTE)

10-20 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted an hour ago
  • Be among the first 10 applicants
Early Applicant
Quick Apply

Job Description

High Performance Computing (HPC) Engineer

Summary:

We are seeking an experienced HPC Subject Matter Expert (SME) to design, implement, and optimize high-performance compute solutions that enable our development teams to run large-scale, compute-intensive workloads efficiently. This role will focus on leveraging Azure and Onprem HPC services, MPI networking, and infrastructure automation to support data science, quantitative research, and simulation workloads. The ideal candidate combines strong technical expertise in parallel computing with practical experience in network optimization and developer enablement.

Key Responsibilities:

Architecture & Design

  • Design and implement HPC clusters on Azure using compute-optimized VMs, Azure CycleCloud, or custom orchestrations.
  • Architect MPI-based workloads (e.g., OpenMPI, Intel MPI) for distributed computation, ensuring optimal inter-node communication and performance.
  • Configure Azure Gateway, VNet, and subnet/netmask settings for low-latency and secure cluster networking.
  • Integrate HPC resources with Kubernetes, CI/CD pipelines, and storage backends (e.g., Blob, Lustre, or NFS).

Performance Optimization

  • Profile, benchmark, and tune MPI applications and compute nodes for maximum performance.
  • Optimize network topologies, NUMA configurations, and job schedulers (SLURM, PBS, or Azure Batch).
  • Automate resource scaling and cost optimization through Infrastructure-as-Code (IaC) tools such as Terraform or ARM templates.

Developer Enablement

  • Collaborate with application developers and data scientists to containerize or parallelize workloads.
  • Provide best practices for distributed computing, debugging, and performance analysis in MPI environments.
  • Develop and maintain internal documentation, training materials, and onboarding guides for HPC usage.

Security & Governance

  • Ensure compliance with enterprise security standards for HPC environments.
  • Collaborate with network and security teams to implement proper access control, encryption, and segmentation.
  • Support audit readiness and security hardening for compute nodes and associated data paths.

Qualifications Required:

  • Bachelor's or Master's degree in Computer Science, Engineering, Physics, or related field.
  • 10+ years of experience in HPC architecture, system administration, or parallel computing.
  • Deep understanding of MPI standards, interconnects (InfiniBand, RDMA), and network performance tuning.
  • Experience with Azure HPC services, Azure Gateway, and networking concepts (VNets, subnets, netmasks, NSGs).
  • Proficiency with Linux system administration (RHEL/CentOS/Ubuntu) and shell scripting.
  • Strong automation skills using Terraform, Ansible, or Python.

Preferred

  • Experience with containerized HPC workloads (Docker, Singularity, Kubernetes).
  • Familiarity with GPU workloads (CUDA, NCCL) and hybrid CPU/GPU clusters.
  • Exposure to CI/CD pipelines integrating HPC workloads for DevOps or MLOps use cases.
  • Strong communication skills with ability to translate complex HPC concepts for developers.

More Info

Function:
Employment Type:
Open to candidates from:
Indian

Job ID: 146995411