Responsible for designing, building, and deploying large-scale, high-performance ML solutions using strong core computer science fundamentals. The role focuses on developing deep learning models, optimizing inference systems for low latency and high throughput, and integrating them into distributed production environments. The position also involves improving system efficiency, ensuring secure and compliant architectures, and collaborating with cross-functional teams to convert research models into production-grade applications.
Accountabilities :
- End-to-End ML Pipeline Development
- Build and optimize model training, evaluation, and deployment pipelines for large-scale production environments.
- High-Performance Inference Engineering
- Architect and scale distributed inference systems capable of processing large request volumes with minimal latency and efficient compute utilization
- Model Optimization and Tuning
- Implement model refinement techniques such as quantization, pruning, ONNX/TensorRT acceleration, and GPU-level optimizations to improve real-time inference performance.
- Data Engineering and Processing
- Develop robust data ingestion, preprocessing, and augmentation frameworks for structured, unstructured, and multimodal datasets while maintaining data integrity and quality
Model Deployment & Performance Metrics :
1. Reduction in inference latency, compute cost, and system memory footprint 2. Successful deployment of ML models meeting defined SLAs (e.g., throughput, latency)
Pipeline Reliability & Efficiency :
1. Uptime, stability, and scalability of ML services in production
2. Faster development cycle time through optimized pipelines and tooling
Educational Qualifications
Bachelors or Masters degree in Computer Science, Engineering, or related field
Skills Required (Technical and / or Behavioral)
- Strong fundamentals in computer architecture, operating system internals, and system design.
- Experience in designing and maintaining scalable API-based systems.
- (Preferred) Proficiency in Python with deep learning frameworks such as PyTorch and TensorFlow, as well as experience in system-level programming (C++ preferred).
- Experience building high-performance APIs using frameworks such as FastAPI.
- (Preferred) Knowledge of computer vision and NLP frameworks including OpenCV and HuggingFace Transformers.
- (Preferred) Familiarity with MLOps tools and platforms such as MLflow, Kubeflow, Airflow, Docker, and Kubernetes.