Search by job, company or skills

  • Posted 8 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description

Senior MLOps / LLMOps Engineer (Databricks Expert) - Job Description

Introduction

Join an amazing company where you can work with cutting-edge technologies and platforms. Give your career an Infinite edge, with a stimulating environment and a global work culture. Be a part of an organization where we celebrate integrity, innovation, collaboration, teamwork, and passion. A culture where every employee is a leader delivering ideas that make a difference to this world we live in.

In the MLOps / LLMOps Engineer responsibilities include, although not limited to:

  • Design, build, and operate end-to-end MLOps and LLMOps pipelines for training, deployment, monitoring, and lifecycle management of ML and generative AI models.
  • Lead Databricks-based ML and LLM platforms using MLflow, Model Registry, Feature Store, and Databricks Workflows.
  • Deploy and operate ML and LLM models in production with scalability, reliability, and high availability.
  • Architect and optimize high performance distributed ML and LLM training pipelines on Databricks using advanced Spark tuning, autoscaling policies, optimized cluster configurations, and photon execution.
  • Implement high performance inference architectures, including GPU accelerated model serving, vector search indexing optimization, and low latency LLM deployments.
  • Build mission-critical ML/LLM systems with strict SLAs for throughput, latency, scalability, and resilienceensuring 24/7 production readiness.
  • Lead implementation of automated retraining and evaluation frameworks with configurable thresholds for drift, quality degradation, and model reliability.
  • Implement cost efficient ML and LLM operations, leveraging cluster policy enforcement, job orchestration patterns, caching strategies, and compute aware model design.
  • Implement CI/CD pipelines for ML workflows including model versioning, testing, validation, and automated deployment.
  • Operationalize LLM-based applications including RAG pipelines, embeddings, vector search, and prompt lifecycle management.
  • Monitor model performance, drift, latency, bias, and cost with alerting and retraining strategies.
  • Collaborate with data scientists, data engineers, and platform teams for secure and reproducible ML solutions.
  • Define governance, lineage, reproducibility, and compliance standards for ML and LLM systems.
  • Integrate Databricks ML workloads with Azure services such as Azure ML, ADLS Gen2, Key Vault, and Azure DevOps.
  • Troubleshoot distributed ML pipelines and production inference services.
  • Mentor teams on MLOps and LLMOps best practices.

In addition to the qualifications listed below, the ideal candidate will demonstrate the following traits:

  • Experience with advanced MLOps/LLMOps reliability engineering, including rate limiting, autoscaling, circuit breaking, caching, and SLA management.
  • Ownership mindset for production-grade ML systems.
  • Ability to bridge experimentation and enterprise deployment.
  • Passion for automation and reliability.
  • Strong communication and collaboration skills.
  • Proactive approach to performance, cost, and scalability.
  • Curiosity for evolving Generative AI technologies.

Minimum Qualifications

  • Bachelor's degree in Computer Science, Engineering, AI, or related field.
  • 7+ years of experience in ML engineering, data engineering, or platform engineering.
  • Hands-on experience with MLOps and LLMOps pipelines in production.
  • Strong expertise with Databricks for ML workloads.
  • Experience deploying ML and LLM models in Azure environments.
  • Proficiency in Python and ML frameworks.
  • Experience with CI/CD for ML systems.
  • Knowledge of model monitoring, drift detection, and retraining.
  • Experience with Docker and Kubernetes.
  • Understanding of AI security, governance, and compliance.
  • Strong English communication skills.

Preferred Qualifications

  • Experience with RAG architectures, vector databases, embeddings, and prompt engineering.
  • Advanced Databricks capabilities including Unity Catalog and Lakehouse AI.
  • Familiarity with Azure AI and enterprise AI governance.
  • Responsible AI and ethics experience.
  • Agile/Scrum delivery experience.
  • Relevant certifications in Databricks or Azure AI.

Qualifications

BE

Range Of Year Experience-Min Year

4

Range Of Year Experience-Max Year

8

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 139014177

Similar Jobs