Search by job, company or skills

ANI Calls India Private Limited

AI Inference Optimization Engineer

1-5 Years
Save
  • Posted 7 hours ago
  • Over 50 applicants
Quick Apply

Job Description

About the Role

We are seeking an AI Inference Optimization Engineer to design, build, and support high-performance model-serving pipelines for scalable enterprise AI applications. The ideal candidate will work closely with business, data, and engineering teams to deliver secure, scalable, and measurable AI solutions while optimizing inference performance, resource utilization, and deployment efficiency.

Key Responsibilities

  • Design and develop high-performance AI inference and model-serving pipelines.
  • Optimize large language model inference using vLLM and TensorRT-LLM.
  • Improve GPU utilization through batching, caching, and request scheduling techniques.
  • Build scalable and reliable AI serving infrastructure for enterprise applications.
  • Deploy and manage inference workloads using Kubernetes-based environments.
  • Monitor system performance, latency, throughput, and infrastructure utilization.
  • Collaborate with AI engineers, data scientists, platform teams, and business stakeholders.
  • Implement observability, monitoring, and alerting solutions for AI services.
  • Continuously improve inference efficiency, scalability, and cost optimization.
  • Ensure security, reliability, and governance standards are followed throughout the AI lifecycle.

Required Skills

  • Hands-on experience with vLLM
  • Knowledge of TensorRT-LLM
  • Strong understanding of GPU-based inference optimization
  • Experience with batching and caching techniques
  • Proficiency in Kubernetes
  • Experience with monitoring and observability tools
  • Understanding of scalable AI serving architectures

Experience Requirements

  • Up to 5 years of overall experience
  • Minimum 1–2 years of relevant hands-on experience in AI inference, model serving, MLOps, or related technologies

More Info

Job Type:
Function:
Employment Type:

Job ID: 149497979