3+ years of experience in deploying and optimizing machine learning models in production, with 1+ years of experience in deploying deep learning models
Experience deploying async inference APIs (FastAPI, gRPC, Ray Serve etc.)
Understanding of PyTorch internals and inference-time optimization
Familiarity with LLM runtimes: vLLM, TGI, TensorRT-LLM, ONNX Runtime etc.
Familiarity with GPU profiling tools (nsight, nvtop), model quantization pipelines