Description
We are looking for an MLOps Engineer with 3–5 years of experience building and operating production ML systems, with meaningful exposure to the pharma or healthcare domain. You will be responsible for taking data science outputs and making them robust, scalable, and maintainable in production, covering the full lifecycle from pipeline orchestration and model deployment to monitoring, retraining, and compliance across client engagements.
You will work closely with data scientists, data engineers, and business stakeholders to bridge the gap between experimental models and production-grade systems. Experience navigating regulated or DG-sensitive environments is a strong plus.
Key Responsibilities
- Own and operate end-to-end ML pipelines from feature engineering and training runs through deployment, monitoring, and scheduled retraining, on cloud infrastructure (AWS, Azure, or GCP).
- Implement and maintain CI/CD pipelines for ML workflows, ensuring reproducibility, version control of models and data, and reliable rollback capabilities.
- Build and orchestrate ML and data pipelines using tools like Airflow, Prefect, or similar; manage dependencies, scheduling, and failure handling.
- Build, manage, and govern ML workflows on Databricks, including Jobs, Delta Live Tables, and Unity Catalog, as the primary platform for data and ML pipeline execution.
- Manage the full model lifecycle using MLflow on Databricks: experiment tracking, model registry, versioning, stage transitions, and lineage documentation.
- Design and deploy ML pipelines on Kubernetes using Kubeflow Pipelines for clients who operate outside Databricks environments, including pipeline authoring, component containerization, and run management.
- Containerize and deploy models as scalable APIs or batch inference services using Docker and Kubernetes (or managed equivalents); define SLAs and monitor adherence.
- Instrument production models for performance drift, data quality degradation, and upstream schema changes; define alerting and intervention protocols.
- Collaborate with data scientists to translate experimental notebooks into maintainable, testable, production-ready code.
- Maintain experiment tracking and model registries (MLflow or equivalent); enforce model lineage and reproducibility standards.
- Partner with data governance and compliance teams to ensure ML pipelines meet audit, traceability, and access-control requirements relevant to pharma/healthcare data.
- Participate in code reviews, architecture discussions, and documentation, raising the engineering bar across the DS/ML team.
- Develop visualization layers and support self-service analytics platforms.
Required Skills & Qualifications
- 3–5 years of hands-on MLOps or ML engineering experience in a production environment.
- Bachelors or masters
- Strong proficiency in Python with experience in packaging, testing, and generating production-grade code. Along with comfortability in handling complex querying across big data through SQL/PySpark
- Hands-on experience in Databricks with Jobs, Delta Lake, Unity Catalog, and the Databricks ML runtime beyond Spark compute. Additionally experience with MLflow on Databricks for experiment tracking, model registry (including Unity Catalog-backed registry), lifecycle management, and model serving.
- Docker, Kubernetes or equivalents for containerization and scalable model serving.
- Experience authoring, deploying, and managing ML pipelines on Kubeflow; comfortable working within existing Kubernetes clusters on client infrastructure.
- CI/CD pipelines for ML workflows (GitHub Actions, Azure DevOps, or equivalent).
- REST API design and deployment of model-serving endpoints.
- Experience with Cloud platforms (AWS, Azure, or GCP) for infrastructure management and managed ML services.
- Model monitoring, drift detection, and alerting in production.
- Git and version control best practices; Linux and Bash scripting.
- Hands-on experience with US pharma data sources like claims, EHR, Rx, or similar.
Nice to have
- SageMaker, Vertex AI, or other managed ML platforms.
- OpenShift or other enterprise Kubernetes distributions commonly found in pharma client environments.
- Sufficient depth of understanding with scikit-learn, XGBoost, PyTorch, or TensorFlow, to debug and optimize what data scientists hand over.
- Terraform or IaC tooling for ML infrastructure.
- OMOP, FHIR, or HL7 familiarity.
- Awareness of 21 CFR Part 11, GxP, or audit trail requirements in pharma ML contexts.
- LLM/GenAI pipeline experience: prompt versioning, evaluation, latency monitoring.
Skills: cloud,ml ops,ml deployment