Design, build, and maintain scalable ML pipelines in on-premises environments, ensuring high availability and reliability.
Implement and manage Infrastructure as Code (IaC) using tools like Ansible, Terraform (for private cloud), or Puppet for development, testing, and production setups.
Extend and enhance existing ML workflows to support evolving data science needs with minimal but impactful changes.
Act as the infrastructure and MLOps SME, collaborating with data scientists to guide and support model deployment and operationalization.
Document system architecture, infrastructure usage, and design via tools like Confluence, GitHub Wikis, and architectural diagrams.
Research and implement optimizations for ML workflows, compute resource utilization, and storage management.
Lead cross-functional initiatives related to ML product deployment, re-platforming, and modernization efforts in the on-prem environment.