Lyric is an AI-first, platform-based healthcare technology company, committed to simplifying the business of care by preventing inaccurate payments and reducing overall waste in the healthcare ecosystem, enabling more efficient use of resources to reduce the cost of care for payers, providers, and patients. Lyric, formerly ClaimsXten, is a market leader with 35 years of pre-pay editing expertise, dedicated teams, and top technology. Lyric is proud to be recognized as 2025 Best in KLAS for Pre-Payment Accuracy and Integrity and is HI-TRUST and SOC2 certified, and a recipient of the 2025 CandE Award for Candidate Experience. Interested in shaping the future of healthcare with AI Explore opportunities at lyric.ai/careers and drive innovation with #YouToThePowerOfAI.
Job Summary
We are looking for a Machine Learning Engineer with strong production experience who enjoys working at the intersection of engineering, operations, and customer support. This role is focused on supporting, troubleshooting, and improving production ML pipelines and data workflows, while serving as a key technical point of contact for customer-facing teams.
You will investigate customer-reported issues, diagnose data and workflow failures in production environments, and implement fixes or incremental improvements to existing ML systems, while this role includes contributing new code and enhancements, production support and operational ownership are primary responsibilities.
This position is ideal for someone who is hands-on, pragmatic, and motivated by keeping ML systems reliable, observable, and usable in real-world environments.
Job Responsibilities
Production Support & Customer Enablement
- Act as the primary technical point of contact for investigating and resolving customer-reported issues related to ML models, data pipelines, and workflows.
- Troubleshoot production failures across data ingestion, feature engineering, model execution, and orchestration layers.
- Analyze logs, metrics, and data to identify root causes of issues in live environments.
- Partner closely with Customer Success and Product teams to communicate findings, workarounds, and long-term fixes.
- Provide guidance to customers on best practices for operating and monitoring ML workflows.
Model Maintenance, Enhancement & Deployment
- Maintain, debug, and enhance existing ML models built with PyTorch or TensorFlow.
- Implement targeted model improvements or fixes based on production issues, performance degradation, or changing data patterns.
- Support model evaluation, validation, and retraining workflows rather than building new models from scratch.
- Deploy fixes and enhancements safely across development, staging, and production environments.
MLOps & Operational Workflows
- Operate and support existing ML workflows built with Airflow, Kedro, and MLflow.
- Debug failed pipelines, broken dependencies, and environment-specific issues.
- Improve reliability, observability, and documentation of ML workflows over time.
- Participate in CI/CD processes for ML systems, primarily focused on safe deployments and rollback strategies.
Data Engineering & Processing
- Handle large-scale datasets efficiently using distributed computing frameworks (Dask, Spark).
- Ensure data quality, consistency, and compliance with governance standards.
- Exposure to Snowflake or Databricks is a plus.
Analytics & Visualization
- Collaborate with business and analytics teams to translate ML outputs into actionable insights.
- Design and develop dashboards and reports using Power BI or similar BI tools.
- Perform exploratory data analysis (EDA) and communicate findings effectively to stakeholders.
- Build KPI-driven visualizations to monitor model performance and business impact.
Monitoring & Observability
- Implement model drift detection, performance tracking, and automated retraining strategies.
- Use experiment tracking tools (MLflow, Weights & Biases) for transparency and reproducibility.
Collaboration & Documentation
- Work closely with data scientists, software engineers, and product teams to align ML solutions with business goals.
- Document ML workflows, best practices, and operational guidelines.
Required Qualifications
- 36 years of experience supporting, deploying, or operating machine learning systems in production environments.
- Strong proficiency in Python and libraries like Pandas, Dask, NumPy, Scikit-learn.
- Hands-on experience with PyTorch or TensorFlow for model development.
- Solid understanding of MLOps tools: Airflow, Kedro, MLflow (or equivalents).
- Experience deploying ML models in production environments (APIs, batch jobs, streaming).
- Familiarity with containerization (Docker) and orchestration (Kubernetes).
- Exposure to cloud platforms (Azure, AWS, or GCP) for ML workloads.
- Experience with Power BI or similar BI tools for analytics and visualization.
- Strong problem-solving skills and ability to work in agile, fast-paced environments.
Preferred Qualifications
- Experience with feature stores (Feast, Tecton) and data versioning tools (DVC).
- Knowledge of distributed training and GPU optimization.
- Familiarity with experiment management tools (Weights & Biases, Neptune.ai).
- Understanding of model explainability and responsible AI practices.
- Contributions to open-source ML projects or technical blogs.