Model Compression and Quantization Engineer

ANI Calls India Private Limited

Hyderabad

1-5 Years

Save

Posted 7 hours ago
Be among the first 20 applicants

Early Applicant

Quick Apply

Job Description

About the Role

We are looking for a Model Compression and Quantization Engineer to design, build, and support optimized AI models for lower-cost, faster, and edge-ready inference. The ideal candidate will collaborate with business, data, and engineering teams to deliver secure, scalable, and measurable AI solutions while improving model efficiency and deployment performance.

Key Responsibilities

Design and implement model compression strategies to optimize AI models for production environments.
Apply quantization and pruning techniques to reduce model size and improve inference speed.
Convert and optimize models using ONNX and TensorRT for deployment across various platforms.
Perform model benchmarking to evaluate latency, throughput, memory usage, and accuracy trade-offs.
Develop and maintain optimization pipelines using Python and AI frameworks.
Collaborate with data scientists, ML engineers, and business stakeholders to deliver efficient AI solutions.
Support deployment of optimized models for cloud, edge, and embedded environments.
Monitor model performance and recommend improvements for scalability and cost optimization.
Ensure AI solutions comply with security, reliability, and governance standards.

Required Skills

Strong understanding of model quantization techniques
Experience with model pruning and compression methods
Hands-on experience with ONNX and TensorRT
Expertise in model benchmarking and performance optimization
Proficiency in Python
Understanding of AI model deployment and inference optimization

Experience Requirements

Up to 5 years of overall experience
Minimum 1–2 years of relevant hands-on experience in model optimization, compression, quantization, or related AI technologies