Role: ML Ops Engineer/ Architect
Location: Hyderabad (Remote)
Industry: Healthcare/ Payer (must)
Qualifications
- Cross-Functional Collaboration and Stakeholder Management: Partner with data science, product management, engineering, and business teams to understand their requirements and ensure the MLOps platform effectively supports their needs
- Cross-Disciplinary Knowledge: Apply knowledge from related disciplines, such as data science and health/biology sciences, to design holistic MLOps solutions that meet the unique needs of the organization
- 8+ years of experience in ML Ops, Data Engineering, or related role required
- Service and Quality Excellence: Ability to demonstrate an uncompromising commitment to delivering exceptional care to create an unmatched value proposition for our patients
- Honor our Mission and Values: Ability to build trust and act with authenticity to cultivate a culture of integrity, inclusion, and mutual respect
- Attain and Leverage Strategic Relationships: Ability to develop and strengthen collaborative relationships with both internal and external stakeholders
Responsibilities
- This role combines deep cloud architecture expertise with advanced AI/ML knowledge to develop solutions that streamline workflows, enable seamless collaboration, and drive innovation
- As a key contributor to the organization's AI/ML strategy, you will partner with cross-functional teams, including data engineers/scientists, product managers, and cloud engineers, to align platform development with business objectives
- Your work will directly support the deployment of Responsible AI solutions that prioritize transparency, fairness, and ethical practices
- Platform Development: Lead the enhancement of the ML Ops platform to improve the developer experience for data and ML engineers
- Optimize workflows by integrating state-of-the-art tools and technologies, ensuring scalability and efficiency
- Cloud Infrastructure Design and Management: Architect and manage the cloud infrastructure supporting the MLOps platform, leveraging infrastructure-as-code (IaC) tools like Terraform
- Optimize for scalability, security, cost-effectiveness, and high availability
- Effectively communicate technical concepts and strategies to both technical and non-technical audiences
- AI/ML Reliability and Observability: Collaborate with the AI/ML reliability engineering team to design and implement components that ensure the platform's operational reliability, observability, and fault tolerance
- DevOps for Machine Learning Workloads: Build and maintain robust DevOps pipelines tailored for ML workflows, enabling automated model training, testing, deployment, and monitoring
- Tool Development and System Reliability: Design and manage tools to enhance platform reliability, including dashboards, logging systems, and alerting frameworks, to ensure seamless operations
- Practices and adheres to the Code of Conduct philosophy and Mission and Value Statement.