Search by job, company or skills

Infinite Computer Solutions

Technical Lead

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 months ago

Job Description

Job Description

Senior MLOps / LLMOps Engineer (Databricks Expert) - Job Description

Introduction

Join an amazing company where you can work with cutting-edge technologies and platforms. Give your career an Infinite edge, with a stimulating environment and a global work culture. Be a part of an organization where we celebrate integrity, innovation, collaboration, teamwork, and passion. A culture where every employee is a leader delivering ideas that make a difference to this world we live in.

In the MLOps / LLMOps Engineer responsibilities include, although not limited to:

  • Design, build, and operate end-to-end MLOps and LLMOps pipelines for training, deployment, monitoring, and lifecycle management of ML and generative AI models.
  • Lead Databricks-based ML and LLM platforms using MLflow, Model Registry, Feature Store, and Databricks Workflows.
  • Deploy and operate ML and LLM models in production with scalability, reliability, and high availability.
  • Architect and optimize high performance distributed ML and LLM training pipelines on Databricks using advanced Spark tuning, autoscaling policies, optimized cluster configurations, and photon execution.
  • Implement high performance inference architectures, including GPU accelerated model serving, vector search indexing optimization, and low latency LLM deployments.
  • Build mission-critical ML/LLM systems with strict SLAs for throughput, latency, scalability, and resilienceensuring 24/7 production readiness.
  • Lead implementation of automated retraining and evaluation frameworks with configurable thresholds for drift, quality degradation, and model reliability.
  • Implement cost efficient ML and LLM operations, leveraging cluster policy enforcement, job orchestration patterns, caching strategies, and compute aware model design.
  • Implement CI/CD pipelines for ML workflows including model versioning, testing, validation, and automated deployment.
  • Operationalize LLM-based applications including RAG pipelines, embeddings, vector search, and prompt lifecycle management.
  • Monitor model performance, drift, latency, bias, and cost with alerting and retraining strategies.
  • Collaborate with data scientists, data engineers, and platform teams for secure and reproducible ML solutions.
  • Define governance, lineage, reproducibility, and compliance standards for ML and LLM systems.
  • Integrate Databricks ML workloads with Azure services such as Azure ML, ADLS Gen2, Key Vault, and Azure DevOps.
  • Troubleshoot distributed ML pipelines and production inference services.
  • Mentor teams on MLOps and LLMOps best practices.

In addition to the qualifications listed below, the ideal candidate will demonstrate the following traits:

  • Experience with advanced MLOps/LLMOps reliability engineering, including rate limiting, autoscaling, circuit breaking, caching, and SLA management.
  • Ownership mindset for production-grade ML systems.
  • Ability to bridge experimentation and enterprise deployment.
  • Passion for automation and reliability.
  • Strong communication and collaboration skills.
  • Proactive approach to performance, cost, and scalability.
  • Curiosity for evolving Generative AI technologies.

Minimum Qualifications

  • Bachelor's degree in Computer Science, Engineering, AI, or related field.
  • 7+ years of experience in ML engineering, data engineering, or platform engineering.
  • Hands-on experience with MLOps and LLMOps pipelines in production.
  • Strong expertise with Databricks for ML workloads.
  • Experience deploying ML and LLM models in Azure environments.
  • Proficiency in Python and ML frameworks.
  • Experience with CI/CD for ML systems.
  • Knowledge of model monitoring, drift detection, and retraining.
  • Experience with Docker and Kubernetes.
  • Understanding of AI security, governance, and compliance.
  • Strong English communication skills.

Preferred Qualifications

  • Experience with RAG architectures, vector databases, embeddings, and prompt engineering.
  • Advanced Databricks capabilities including Unity Catalog and Lakehouse AI.
  • Familiarity with Azure AI and enterprise AI governance.
  • Responsible AI and ethics experience.
  • Agile/Scrum delivery experience.
  • Relevant certifications in Databricks or Azure AI.

Qualifications

BE

Range Of Year Experience-Min Year

4

Range Of Year Experience-Max Year

8

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 139014177

Similar Jobs