Summary role description:
Hiring a Machine Learning (ML) Engineer for one of the leaders in firmware and platform-level software provider.
Company description:
Our client is a high-end tech company playing their part in the backbone of enterprise IT infrastructure. The company is US headquartered and has a global footprint. Their technology is integrated into millions of devices worldwide, including servers, desktops, and embedded systems. With decades of experience in low-level systems development, they play a critical role in shaping the foundational software that powers modern computing platforms.
Role details:
- Title / Designation:Machine Learning (ML) Engineer
- Location:Kolkata
- Experience: 7+ years
Role & responsibilities:
- Build, train, and fine-tune LLMs, and evaluate their performance using techniques like SFT, LoRA/QLoRA, and RLHF.
- Optimize models for local or edge environments by improving speed and efficiency through quantization, pruning, and distillation.
- Deploy models into production on-premises or at the edge using frameworks such as PyTorch, ONNX, TensorRT, vLLM, or llama.cpp, and integrate them into applications via APIs and internal services.
- Design and maintain scalable training and inference pipelines to ensure reproducibility and efficiency.
- Monitor model performance in production, including accuracy, drift, latency, and resource utilization, and continuously optimize outcomes.
- Ensure models meet security, privacy, and compliance requirements, especially in restricted or offline environments.
- Collaborate with software engineers, infrastructure teams, and domain experts to deliver end-to-end AI solutions.
- Document model architectures, training processes, and deployment workflows for clarity and future use.
Candidate requirements:
- Master's or Ph.D. in Computer Science, Engineering, or a related field, or equivalent practical experience, along with 7+ years of experience in AI/ML, including at least 2 years working on LLMs, large-scale neural networks, RAG, or AI-driven automation.
- Strong hands-on experience with LLMs such as LLaMA, Mistral, Falcon, or similar open-weight models, along with proficiency in Python and frameworks like PyTorch or TensorFlow.
- Expertise in vector databases and retrieval systems (FAISS, Weaviate, Chroma, Pinecone, Milvus) and experience building RAG-based solutions.
- Experience developing and deploying models in local, on-premises, or resource-constrained environments, with a solid understanding of model optimization techniques like quantization, batching, and memory optimization.
- Hands-on experience with multi-agent AI systems (LangGraph, CrewAI, AutoGen, OpenAI Assistants API) and building autonomous or AI-driven workflows.
- Strong experience in end-to-end model development, working with business stakeholders to define KPIs and delivering multi-modal (text and image) or ensemble models.
- Familiarity with Linux, Docker, and basic cloud or on-prem infrastructure concepts.
- Experience with distributed training, multi-GPU systems, and handling large-scale models (10B+ parameters or multi-billion token datasets) is a plus.
- Knowledge of inference optimization tools such as vLLM, TensorRT-LLM, and ONNX, along with exposure to MLOps tools for model versioning and monitoring.
- Background in working with security-sensitive or regulated environments (such as finance, healthcare, or government) is preferred.
Selection Process:
- Two technical rounds
- One HR round
Recruiter Details: