MLOps/LLMOps Engineer

8-10 Years

Save

Early Applicant

Job Description

Key Responsibilities:

Design, build, and maintain CI/CD pipelines for ML model training, validation, and deployment
Automate and optimize ML workflows, including data ingestion, feature engineering, model training, and monitoring
Deploy, monitor, and manage LLMs and other ML models in production (on-premises and/or cloud)
Implement model versioning, reproducibility, and governance best practices
Collaborate with data scientists, ML engineers, and software engineers to streamline end-to-end ML lifecycle
Ensure security, compliance, and scalability of ML/LLM infrastructure
Troubleshoot and resolve issues related to ML model deployment and serving
Evaluate and integrate new MLOps/LLMOps tools and technologies
Mentor junior engineers and contribute to best practices documentation

Required Skills & Qualifications:

8+ years of experience in DevOps, with at least 3 years in MLOps/LLMOps
Strong experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker)
Proficient in CI/CD tools (Jenkins, GitHub Actions, GitLab CI, etc.)
Hands-on experience deploying and managing different types of AI models (e.g., OpenAI, HuggingFace, custom models) to be used for developing solutions.
Experience with model serving tools such as TGI, vLLM, BentoML, etc.
Solid scripting and programming skills (Python, Bash, etc.)
Familiarity with monitoring/logging tools (Prometheus, Grafana, ELK stack)
Strong understanding of security and compliance in ML environments

Preferred Skills:

Knowledge of model explainability, drift detection, and model monitoring
Familiarity with data engineering tools (Spark, Kafka, etc.
Knowledge of data privacy, security, and compliance in AI systems.
Strong communication skills to effectively collaborate with various stakeholders
Critical thinking and problem-solving skills are essential
Proven ability to lead and manage projects with cross-functional teams