Search by job, company or skills

C

MLOPs and LLMOps Engineer

8-12 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago
  • Be among the first 10 applicants
Early Applicant
Quick Apply

Job Description

Key Responsibilities

  • Design, implement, and maintain end-to-end MLOps pipelines for model training, validation, deployment, and monitoring.
  • Build and manage LLMOps pipelines for fine-tuning, evaluating, and deploying large language models (e.g., OpenAI, HuggingFace Transformers, custom LLMs).
  • Use Kubeflow and Kubernetes to orchestrate reproducible, scalable ML/LLM workflows.
  • Implement CI/CD pipelines for ML projects using GitHub Actions , Argo Workflows , or Jenkins .
  • Automate infrastructure provisioning using Terraform , Helm , or similar IaC tools.
  • Integrate model registry and artifact management with tools like MLflow , Weights & Biases , or DVC .
  • Manage containerization with Docker and container orchestration via Kubernetes .
  • Set up monitoring , logging , and alerting for production models using tools like Prometheus , Grafana , and ELK Stack .
  • Collaborate closely with Data Scientists and DevOps engineers to ensure seamless integration of models into production systems.
  • Ensure model governance, reproducibility, auditability, and compliance with enterprise and legal standards.
  • Conduct performance profiling, load testing, and cost optimization for LLM inference endpoints.

Required Skills and Experience

  • Core MLOps/LLMOps Expertise
  • 5+ years of hands-on experience in MLOps/DevOps for AI/ML.
  • 2+ years working with LLMs in production (e.g., fine-tuning, inference optimization, safety evaluations).
  • Strong experience with Kubeflow Pipelines , KServe , and MLflow .
  • Deep knowledge of CI/CD pipelines with GitHub Actions , GitLab CI , or CircleCI .
  • Expert in Kubernetes , Helm , and Terraform for container orchestration and infrastructure as code.
  • Programming & Frameworks
  • Proficient in Python , with experience in ML libraries such as scikit-learn , TensorFlow , PyTorch , Hugging Face Transformers .
  • Familiarity with FastAPI , Flask , or gRPC for building ML model APIs.
  • Cloud & DevOps
  • Hands-on with AWS , Azure , or GCP (preferred: EKS, S3, SageMaker, Vertex AI, Azure ML).
  • Knowledge of model serving using Triton Inference Server , TorchServe , or ONNX Runtime .
  • Monitoring & Logging
  • Tools: Prometheus , Grafana , ELK , OpenTelemetry , Sentry .
  • Model drift detection and A/B testing in production environments.

Soft Skills

  • Strong problem-solving and debugging skills.
  • Ability to mentor junior engineers and collaborate with cross-functional teams.
  • Clear communication, documentation, and Agile/Scrum proficiency.

Preferred Qualifications

  • Experience with LLMOps platforms like Weights & Biases , TruEra , PromptLayer , LangSmith .
  • Experience with multi-tenant LLM serving or agentic systems (LangChain, Semantic Kernel).
  • Prior exposure to Responsible AI practices (bias detection, explainability, fairness).

More Info

Job Type:
Industry:
Role:
Employment Type:
Open to candidates from:
Indian

Job ID: 117339649