AI Inference Optimization Engineer

ANI Calls India Private Limited

Hyderabad

1-5 Years

Save

Posted 7 hours ago
Over 50 applicants

Quick Apply

Job Description

About the Role

We are seeking an AI Inference Optimization Engineer to design, build, and support high-performance model-serving pipelines for scalable enterprise AI applications. The ideal candidate will work closely with business, data, and engineering teams to deliver secure, scalable, and measurable AI solutions while optimizing inference performance, resource utilization, and deployment efficiency.

Key Responsibilities

Design and develop high-performance AI inference and model-serving pipelines.
Optimize large language model inference using vLLM and TensorRT-LLM.
Improve GPU utilization through batching, caching, and request scheduling techniques.
Build scalable and reliable AI serving infrastructure for enterprise applications.
Deploy and manage inference workloads using Kubernetes-based environments.
Monitor system performance, latency, throughput, and infrastructure utilization.
Collaborate with AI engineers, data scientists, platform teams, and business stakeholders.
Implement observability, monitoring, and alerting solutions for AI services.
Continuously improve inference efficiency, scalability, and cost optimization.
Ensure security, reliability, and governance standards are followed throughout the AI lifecycle.

Required Skills

Hands-on experience with vLLM
Knowledge of TensorRT-LLM
Strong understanding of GPU-based inference optimization
Experience with batching and caching techniques
Proficiency in Kubernetes
Experience with monitoring and observability tools
Understanding of scalable AI serving architectures

Experience Requirements

Up to 5 years of overall experience
Minimum 1–2 years of relevant hands-on experience in AI inference, model serving, MLOps, or related technologies