Search by job, company or skills

qubrid ai

Senior GPU and AI Infrastructure Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Work from Home. Join one of the most advanced AI companies in the world.

Note - this job requires working late night India time until 4AM to overlap with USA working times.

We are searching for a startup guru who has hands on experience in working on NVIDIA GPU technologies, open source AI models etc. Someone who has previous experience with owning the whole product and roadmap from AI and back-end standpoint. Must be a coder, not just DevOps manager. Someone who is not shy in going overboard to launch a world-class product.

This is a full-time role. If you plan to do 2 or more jobs at the same time or want to do this part-time, that won't work for us. In that case please do not apply.

Salary depends on experience and current verifiable (paychecks) compensation.

Company Description

Headquartered in McLean, Virginia, USA, Qubrid is a global provider of Artificial Intelligence (AI), Data Center and IoT products, solutions, and services. As pioneers in the realm of advanced computing technologies, we pride ourselves on being at the forefront of innovation, empowering businesses with the transformative capabilities of GPUs, Artificial Intelligence, Quantum Computing, IoT and more. We specialize in offering a wide array of hardware and software solutions for industries such as healthcare, manufacturing, finance, government, education and more.

Here's a stronger, product-focused version aimed at attracting engineers who have actually deployed and optimized real-world AI systems—not just experimented with models.

Senior GPU & AI Infrastructure Engineer
About the Role

We are looking for a highly experienced Senior GPU & AI Infrastructure Engineer to build and optimize production-scale AI systems focused on large language models (LLMs), multimodal models, and high-performance inference infrastructure.

This is a hands-on engineering role for someone who has real experience deploying AI products at scale—not just training models in research environments.

You will work on:

  • LLM deployment and serving
  • high-throughput inference systems
  • GPU optimization
  • distributed AI infrastructure
  • cloud GPU environments
  • multi-tenant AI workloads
  • model optimization and batching systems

The ideal candidate understands both AI systems and low-level infrastructure performance.

Responsibilities
AI Model Deployment & Serving
  • Deploy and maintain production-grade LLM and multimodal inference systems
  • Build scalable APIs and serving infrastructure for AI products
  • Implement high-throughput and low-latency inference pipelines
  • Design systems for:
  • batch inferencing
  • streaming inference
  • concurrent request handling
  • model routing
  • autoscaling
GPU Optimization & Performance Engineering
  • Optimize NVIDIA GPU environments for maximum throughput and efficiency
  • Work with:
  • CUDA
  • TensorRT
  • NCCL
  • ONNX Runtime
  • vLLM
  • Triton Inference Server
  • NVIDIA Dynamo
  • Improve:
  • GPU memory utilization
  • token throughput
  • batching efficiency
  • inference latency
  • GPU scheduling and allocation
  • Diagnose and resolve GPU bottlenecks, memory fragmentation, and scaling issues
Infrastructure & Distributed Systems
  • Design scalable AI infrastructure across cloud and dedicated GPU environments
  • Work with distributed inference and multi-node deployments
  • Implement GPU partitioning and resource isolation strategies
  • Manage containerized AI workloads using Docker and orchestration systems
  • Build infrastructure for resilient and fault-tolerant AI services
Cloud & Production Operations
  • Deploy and manage AI systems across providers such as:
  • AWS
  • GCP
  • Azure
  • bare-metal GPU clusters
  • Monitor infrastructure reliability, scaling, and cost efficiency
  • Build observability systems for GPU and inference monitoring
  • Create CI/CD workflows for AI deployments
Engineering & Documentation
  • Write production-grade infrastructure and deployment code
  • Create technical documentation and deployment runbooks
  • Participate in architecture reviews and infrastructure planning
  • Collaborate with AI engineers, backend engineers, and product teams
Requirements
Must-Have
  • Strong hands-on experience deploying LLMs or AI models in production
  • Deep understanding of NVIDIA GPU architecture and optimization
  • Experience with large-scale inference systems and batching strategies
  • Strong Linux, Python, and systems engineering background
  • Experience with containerization and distributed systems
  • Familiarity with model optimization techniques including:
  • quantization
  • KV cache optimization
  • tensor parallelism
  • pipeline parallelism
  • memory optimization
Preferred
  • Experience with multi-tenant AI serving systems
  • Experience building AI products used in production environments
  • Familiarity with Kubernetes and GPU orchestration
  • Experience with fine-tuning pipelines and model training infrastructure
  • Understanding of networking and high-performance compute systems
What We're Looking For
  • Someone who has built and operated real AI infrastructure at scale
  • Strong systems and performance engineering mindset
  • Ability to diagnose deep infrastructure and GPU-level issues
  • Product-focused engineer who understands reliability and user experience
  • Fast execution with strong ownership mentality

Compensation

Competitive salary depending on experience.

To apply, please include:

  • Relevant infrastructure or deployment experience
  • GPU systems and frameworks worked with
  • Links to GitHub, projects, or deployed AI systems if available

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147319205