Search by job, company or skills

O

Software Developer 3

3-8 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 hours ago
  • Over 100 applicants
Quick Apply

Job Description

Basic Qualifications:

  • BS or MS degree in CS or related engineering or science field with 3-5+ years of relevant experience
  • Experience with benchmarking and troubleshooting or optimizing performance of a system.
  • Experience with coding, scripting, and automation.
  • Background in Networking.
  • General Linux skills.
  • Demonstrated ability to lead complex projects, independently resolve ambiguity, collaborate with stakeholders across teams, and communicate effectively.

Desired qualifications:

  • Experience working on clusters, e.g., running HPC/AI workloads, or maintaining an HPC/AI system.
  • Experience troubleshooting or tuning performance on distributed systems.
  • Familiarity with elements of the AI/HPC software stack such as job schedulers (e.g., Slurm); NCCL, RCCL, or MPI; or ML frameworks.
  • Experience with RDMA Networking, i.e., RoCE or Infiniband.
  • Experience architecting or developing solutions on a public cloud platform.

Responsibilities

  • Carry out performance studies on GPU clusters with focus on AIML workload performance, network performance and tuning.
  • Design and code solutions for performance benchmarking.
  • Troubleshoot performance problems on RDMA clusters and perform cluster performance validation, including on very novel and not fully understood systems.
  • Document new tools and procedures to a high standard.
  • Write whitepapers to disseminate findings of performance studies.
  • Participate in architecture design and review, code review, and contribute to roadmap development.
  • Mentor junior engineers.
  • Participate in operational rotations.

More Info

About Company

We’re a cloud technology company that provides organizations around the world with computing infrastructure and software to help them innovate, unlock efficiencies and become more effective. We also created the world’s first – and only – autonomous database to help organize and secure our customers’ data. Oracle Cloud Infrastructure offers higher performance, security, and cost savings. It is designed so businesses can move workloads easily from on-premises systems to the cloud, and between cloud and on-premises and other clouds. Oracle Cloud applications provide business leaders with modern applications that help them innovate, attain sustainable growth, and become more resilient. The work we do is not only transforming the world of business--it's helping defend governments, and advance scientific and medical research. From nonprofits to companies of all sizes, millions of people use our tools to streamline supply chains, make HR more human, quickly pivot to a new financial plan, and connect data and people around the world.

Job ID: 130146301

Similar Jobs