Senior ML Engineer

Recro

India

4-10 Years

Save

Posted 21 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About the Role

We are looking for a highly skilled Senior Machine Learning Engineer to build and scale next-generation generative AI systems. This role sits at the intersection of machine learning and backend infrastructure, focusing on taking advanced models from experimentation to reliable, high-performance production systems.

You will work on cutting-edge generative video and multimodal AI use cases, contributing to scalable, low-latency systems used by millions of users globally.

Key Responsibilities

Design, train, fine-tune, and evaluate generative and multimodal models (e.g., text-to-video, image-to-video, lip-sync, character consistency)
Build and manage end-to-end ML pipelines, including data ingestion, preprocessing, training, evaluation, and model versioning
Deploy and maintain scalable ML systems, including model serving, containerization, and GPU-optimized inference
Implement MLOps best practices such as experiment tracking, model monitoring, drift detection, and A/B testing
Optimize inference systems for low latency, high throughput, and cost-efficient GPU utilization
Develop batching and caching strategies to meet production SLAs
Collaborate with backend and platform teams to integrate ML services into distributed systems
Contribute to long-term AI strategy, including foundational model training and fine-tuning pipelines

Required Qualifications

4–10 years of experience in Machine Learning or Applied ML Engineering
Strong fundamentals in deep learning, Transformers, and generative model architectures
Hands-on experience with large-scale model training and fine-tuning (e.g., LoRA, full fine-tuning)
Proven experience in deploying and scaling ML models in production environments
Strong understanding of MLOps practices and tools (e.g., MLflow, Weights & Biases)
Experience with model serving frameworks such as Triton, TorchServe, vLLM, or similar
Proficiency in Python and frameworks like PyTorch
Experience working with cloud platforms (AWS, GCP, or Azure), including GPU provisioning and autoscaling
Ability to work in fast-paced, ambiguous environments with cross-functional teams

Preferred Qualifications

Experience with video generation, diffusion models, or multimodal architectures
Familiarity with LoRA/IC-LoRA techniques for character or identity consistency
Knowledge of inference optimization techniques such as quantization (FP8/INT8), batching, and GPU memory management
Experience with audio/video systems (e.g., TTS, voice cloning, lip-sync pipelines)
Background in media, OTT, or large-scale content platforms

What We Offer