Key Responsibilities:
ML Platform & Pipelines:
- Design and implement end-to-end ML pipelines for training, validation, packaging, and deployment.
- Create reusable templates and tooling for experimentation, feature consumption, and model lifecycle management.
CI/CD & Release Engineering:
- Build CI/CD for ML (tests, quality gates, approvals) using Azure DevOps/GitHub Actions.
- Manage model registry, artifact versioning, and environment reproducibility.
Deployment, Monitoring & Reliability:
- Implement batch and real-time serving patterns; automate monitoring for drift, performance, and data quality.
- Establish SLOs/SLAs for ML services; lead incident response and root-cause analysis for ML production issues.
Cloud & Security:
- Operate ML infrastructure on Azure (preferred), including compute, networking, IAM, and secrets management.
- Apply governance: access controls, auditability, and compliance requirements.
Technical Qualifications Needed:
Education:
Bachelor's or Master's degree in Computer Science, Engineering, Statistics, or a related field.
Mandatory Required skills:
- 7+ years of experience in ML Ops, DevOps, Data Engineering. Data Science, or related roles.
- Hands-on experience with Azure ML and/or Databricks, MLflow, and asset bundles.
- Kubernetes and orchestration experience; familiarity with model serving frameworks.
- Strong Python and SQL skills; experience with automated testing and observability.
- Experience with monitoring/alerting (e.g., Azure Monitor, Prometheus/Grafana).
Desired Skills:
- Infrastructure-as-Code (Terraform/Bicep) and policy-as-code experience.
- Experience with feature stores and data quality frameworks.
- Experience supporting regulated data environments and security reviews.
Certifications
Azure DevOps Engineer Expert, Azure Data Engineer Associate, or similar certifications preferred.