Key Responsibilities:
- Design, implement, and maintain end-to-end ML pipelines for model training, evaluation, and deployment
- Collaborate with data scientists and software engineers to operationalize ML models
- Develop and maintain CI/CD pipelines for ML workflows
- Implement monitoring and logging solutions for ML models
- Optimize ML infrastructure for performance, scalability, and cost-efficiency
- Ensure compliance with data privacy and security regulations
Required Skills and Qualifications:
- Strong programming skills in Python, with experience in ML frameworks
- Expertise in containerization technologies (Docker) and orchestration platforms (Kubernetes)
- Proficiency in cloud platform (AWS) and their ML-specific services
- Experience with MLOps tools
- Strong understanding of DevOps practices and tools (GitLab, Artifactory, Gitflow etc.)
- Knowledge of data versioning and model versioning techniques
- Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack)
- Knowledge of distributed training techniques
- Experience with ML model serving frameworks (TensorFlow Serving, TorchServe)
- Understanding of ML-specific testing and validation techniques