Key Responsibilities
End-to-End ML System Development
- Design, build, and maintain the full ML lifecycle from data ingestion and feature computation to model training, serving, and monitoring.
- Develop low-latency, high-throughput serving systems with effective caching strategies.
- Build stream processing pipelines for real-time feature computation powering dynamic content personalization.
MLOps & Operational Excellence
- Establish best practices using MLflow for experiment tracking, model versioning, and deployment automation.
- Monitor technical health and model quality metrics using tools such as Grafana, Prometheus, and Cloudwatch.
- Implement automated remediation systems to handle faults in resources, data streams, or model responses.
Experimentation & Optimization
- Create advanced experimentation frameworks enabling rapid A/B testing of ML algorithms.
- Scale and optimize training and serving of ML models, including specialized solutions for embeddings and high-dimensional features.
Collaboration & Project Delivery
- Collaborate with Product Managers and cross-functional teams to understand business requirements and deploy production-grade ML solutions.
- Take full ownership of code, design, and project delivery, ensuring quality, scalability, and timely completion.