Location: Gurgaon,
Experience: 5-7 years
Job Description
- A talented MLOps Engineer help operationalize machine learning models at scale.
- The ideal candidate will have a strong background in machine learning, software engineering, and DevOps practices, with expertise in deploying, monitoring, and maintaining ML models in production environments.
- Candidate should have worked or have least good understanding of LangChain, LangGraph, LangSmith, grounding techniques, RAG, embeddings, and related areas to build GENAI based solution along with Python and obviously MLOPS skills.
- Strong experience in MLOps, DevOps, or related fields.
- Proficiency in Python and experience with ML frameworks such as TensorFlow, PyTorch, or Scikit-learn.Hands-on experience with cloud platforms (e.g., AWS, GCP, or Azure) and their ML services.
- Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes).
- Experience with CI/CD tools (e.g.GitHub Actions orJenkins).
- Familiarity with monitoring tools for ML models (e.g., Dynatrace, Prometheus, Grafana, or MLFlow).
- Strong understanding of version control for models and data (e.g., Git).
- Knowledge in scripting using python/unix bash.
Roles & Responsibilities
- Good in communication, coordination and proactive in nature.
- Self driven, customer centric and innovative.
- Checking deployment pipelines for machine learning models.
- Review Code changes and pull requests from the data science team.
- Triggers CI/CD pipelines after code approvals.
- Monitors pipelines and ensures all tests pass and model artifacts are generated/stored correctly.
- Deploys updated models to prod after pipeline completion.
- Works closely with the software engineering and DevOps team to ensure smooth integration.
- Containerize models using Docker and deploy on cloud platforms (like AWS/GCP/Azure).
- Set up monitoring tools to track various metrics like response time, error rates, and resource utilization.
- Establish alerts and notifications to quickly detect anomalies or deviations from expected behavior.
- Analyze monitoring data, log, files, and system metrics.
- Collaborate with the data science team to develop updated pipelines to cover any faults.
- Documenting and troubleshoots, changes, and optimization.
Mandatory Skills: Python, ML flow, GenAI, RAG, LangChain, LangGrapgh, Cloud, Devops, CICD.