Search by job, company or skills

Exl

Senior Machine Learning Engineer

Save
new job description bg glownew job description bg glow
  • Posted 2 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the Company

We are looking for an experienced LLM Ops Engineer to own the end-to-end lifecycle of LLM applications in production - from model selection and pipeline design through fine-tuning, deployment, observability, and continuous improvement. This role sits at the intersection of ML Engineering, DevOps, and Data Engineering, and is critical to ensuring that GenAI systems are reliable, cost-efficient, and scalable in enterprise environments. You will partner closely with AI Research, Product, Platform, and Data Engineering teams.

About the Role

We are looking for an experienced LLM Ops Engineer to own the end-to-end lifecycle of LLM applications in production.

Responsibilities

  • Design, build, and maintain end-to-end LLM pipelines - from data ingestion and pre-processing through model training, fine-tuning, and deployment into production.
  • Implement and manage CI/CD pipelines for ML/LLM workflows using tools such as MLflow, Kubeflow, GitHub Actions, etc., ensuring reproducibility and fast iteration cycles.
  • Own model lifecycle management: versioning, A/B testing, canary deployments, rollbacks, and governance - ensuring models are always production-safe.
  • Architect and operate LLM serving infrastructure on cloud or on-premises with high availability, low latency, and cost efficiency.
  • Build robust monitoring, observability, and alerting frameworks for model drift, hallucinations, latency, token costs, and quality regressions (LangSmith, Weights & Biases, others).
  • Experience with RAG pipelines with vector databases, drive model fine-tuning initiatives for domain-specific applications.
  • Establish and enforce LLMOps best practices including prompt versioning, evaluation frameworks, guardrails, PII policies, and audit trails.
  • Manage AI Gateway and model routing across multiple LLM providers (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Vertex AI) with unified auth, rate limiting, and fallback logic.
  • Optimise inference costs through quantisation, batching strategies, hardware (GPU/TPU) optimisation, and model compression.
  • Mentor junior engineers and contribute to internal documentation, and platform tooling.

Qualifications

  • B.Tech / M.Tech in CS, AI/ML, Mathematics or equivalent.

Required Skills

  • Languages: Python (advanced)
  • Frameworks: LangChain, LangGraph, Hugging Face, PyTorch, TensorFlow
  • MLOps / Pipeline Tools: MLflow, Kubeflow, Apache Airflow, Prefect
  • DevOps / Infra: Docker, Kubernetes, GitHub Actions
  • Cloud Platforms: AWS Bedrock, Azure OpenAI, Google Vertex AI

Preferred Skills

  • Experience with RAG & Vector DBs, Fine tuning (LoRA, PEFT), LLM Observability (LangSmith, Weights & Biases, others), prompt evaluation.
  • Good to have: Security governance (LLM red-teaming, PII redaction, AI safety guardrails), streaming (event driven architecture).

Pay range and compensation package

6 – 10+ Years Overall in software / ML engineering

3+ Years Hands-on production LLM/ML lifecycle

Equal Opportunity Statement

We are committed to diversity and inclusivity.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148089179

Similar Jobs

Noida, India

Skills:

triton TensorflowMachine LearningPytorchMLopsCudaGenerative AITransformer ModelsGANAOTTRT

Gurugram, Gurugram, India

Skills:

AlgorithmsHadoopNode.jsKafkaTensorflowDjangoReactPytorchGcpDockerSparkdata structuresAzureKubernetesPythonAWSAirflowscikit-learntransfer learninggenerative AI technologiesprompt engineeringVector DatabasesRAG architectures

Noida, India

Skills:

NumpyPandasPytorchDockerPythonAWSAirflowagentic design patternsscikit-learnML data librariesMLflowprompt designLLM core concepts

Gurugram, Gurugram, India

Skills:

Pythoncold-start problem-solving strategiesend-to-end ML pipelineslearning-to-rank techniquesfeature engineeringdeep retrieval modelsoffline and online evaluationmetric alignment for recommendation systemsmodel servingcollaborative filtering

Bawana, Delhi, India

Skills:

Computer VisionDeep LearningTensorflowJaxAWSPytorchKubernetesPythonAzureGcpDockerGANsInpainting methodsImage processing techniquesImage-to-image generationVAEsCNNsGenerative AIDeploying Vision models on edge devicesDiffusion models