Search by job, company or skills

  • Posted 8 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

AI Runtime Engineer

Location: Bangalore (or as per requirement)

Experience: 7+ years

Choosing Capgemini means joining a team where you'll be empowered to build cutting-edge AI infrastructure, supported by a collaborative global community, and inspired to reimagine what's possible. Join us in enabling scalable, fault-tolerant AI systems that power next-generation machine learning workloads.

Your Role

As an AI Runtime Engineer, you will design and optimize distributed AI runtimes that enable high-performance, multi-node, multi-GPU training at scale. You'll work closely with AI infrastructure teams to build elastic, fault-tolerant systems and ensure seamless orchestration for advanced AI workloads.

In this role, you will:

  • Architect and implement distributed AI runtime systems with elastic scaling and job recovery.
  • Optimize performance at low levels (CUDA, NCCL, PyTorch internals) for multi-GPU workloads.
  • Develop custom runtime architectures for large-scale AI training pipelines.
  • Integrate orchestration tools like Kubernetes, Ray, TorchElastic, Horovod for containerized AI workloads.
  • Implement fault recovery mechanisms and observability hooks for runtime health monitoring.
  • Collaborate with AI researchers and platform engineers to ensure efficient resource utilization and throughput optimization.
  • Contribute to CI/CD pipelines for AI infrastructure and runtime deployments.

Your Profile

Mandatory Skills:

  • Hands-on experience in distributed training systems, multi-node/multi-GPU orchestration.
  • Expertise in PyTorch internals, CUDA, NCCL, and performance profiling.
  • Strong knowledge of Kubernetes, containerization, and orchestration frameworks.

Preferred Skills:

  • Experience with TorchElastic, Ray, Horovod.
  • Open-source contributions to PyTorch or runtime libraries.
  • Background in HPC, compilers, or systems research.

Education:

  • Bachelor's/Master's in Computer Science, Engineering, or related field.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 136722767