Search by job, company or skills

Amgen Inc

Senior High Performance Computing Engineer

4-8 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 10 hours ago
  • Be among the first 20 applicants
Early Applicant
Quick Apply

Job Description

  • Implement and manage cloud-based infrastructure that supports HPC environments for data science (e.g., AI/ML workflows, Image Analysis).
  • Collaborate with data scientists and ML engineers to deploy scalable machine learning models into production.
  • Ensure the security, scalability, and reliability of HPC systems in the cloud.
  • Optimize cloud resources for cost-effective and efficient use.
  • Stay ahead with the latest in cloud services and industry-standard processes.
  • Provide technical leadership and guidance in cloud and HPC systems management.
  • Develop and maintain CI/CD pipelines for deploying resources to multi-cloud environments.
  • Monitor and fix cluster operations/applications and cloud environments.
  • Document system design and operational procedures.

Must-Have Skills:

  • Expert with Linux/Unix system administration (RHEL, CentOS, Ubuntu, etc.).
  • Proficiency with job scheduling and resource management tools (SLURM, PBS, LSF, etc.).
  • Good understanding of parallel computing, MPI, OpenMP, and GPU acceleration (CUDA, ROCm).
  • Knowledge of storage architectures and distributed file systems (Lustre, GPFS, Ceph).
  • Experience with containerization technologies (Singularity, Docker) and cloud-based HPC solutions.
  • Expert in scripting languages (Python, Bash) and containerization technologies (Docker, Kubernetes).
  • Familiarity with automation tools (Ansible, Puppet, Chef) for system provisioning and maintenance.
  • Understanding of networking protocols, high-speed interconnects, and security best practices.
  • Demonstrable experience in cloud computing (AWS, Azure, GCP) and cloud architecture.
  • Experience with infrastructure as code (IaC) tools like Terraform or CloudFormation and Git.

What we expect of you

  • We are all different, yet we all use our unique contributions to serve patients.
  • Expert knowledge in large Linux environments, networking, storage, and cloud-related technologies.
  • Also, the candidate will have expertise in root-cause analysis and fix while working with a team and stakeholders.
  • Top-level communication and documentation skills are required.
  • Expertise in coding in Python, Bash, YAML is expected.

Good-to-Have Skills:

  • Experience with Kubernetes (EKS) and service mesh architectures.
  • Knowledge of AWS Lambda and event-driven architectures.
  • Familiarity with AWS CDK, Ansible, or Packer for cloud automation.
  • Exposure to multi-cloud environments (Azure, GCP).

Basic Qualifications:

  • Bachelor's degree in computer science, IT, or a related field with 6-8 years of hands-on HPC administration or a related field.

Additional Skills:

  • Experience supporting research in healthcare life sciences.
  • Deep, extensive experience with High Performance Computing (HPC) and cluster management.
  • Familiarity with machine learning frameworks (TensorFlow, PyTorch) and data pipelines.
  • Certifications in cloud architecture (AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, etc.).
  • Experience in an Agile development environment.
  • Prior work with distributed computing and big data technologies (Hadoop, Spark).

Professional Certifications (preferred):

  • Red Hat Certified Engineer (RHCE) or Linux Professional Institute Certification (LPIC).
  • AWS Certified Solutions Architect Associate or Professional.

Preferred Qualifications:

Soft Skills:

  • Strong analytical and problem-solving skills.
  • Ability to work effectively with global, virtual teams.
  • Effective communication and collaboration with cross-functional teams.
  • Ability to work in a fast-paced, cloud-first environment.

About Company

Job ID: 111853193

Similar Jobs