Search by job, company or skills

anthrobyte.ai

Sr DevOps Engineer/Lead

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are looking for a hands-on Senior / Lead DevOps Engineer to build, automate, and scale modern cloud-native platforms for GenAI, AI/ML, and production applications. This role will own CI/CD pipelines, Kubernetes-based deployments, platform reliability, observability, and secure production releases across development, staging, and production environments.

Requirements

Role Overview

The role combines DevOps, platform engineering, and MLOps responsibilities, with a focus on deploying and maintaining AI/ML or GenAI workloads in production. Typical responsibilities in comparable roles include automating model and application deployment, integrating Kubernetes with CI/CD pipelines, monitoring production health, and improving scalability, reliability, and security.devsdata+2

Key Responsibilities

  • Design, build, and maintain scalable CI/CD pipelines for application, API, and GenAI/ML workload deployments.
  • Manage Kubernetes infrastructure across dev, test, staging, and production environments.
  • Automate build, test, release, rollback, and deployment workflows using DevOps best practices.
  • Deploy and support containerized services using Docker and Kubernetes.
  • Enable production deployment of AI/ML models, GenAI applications, RAG pipelines, and related services.tiger-analytics.
  • Implement monitoring, logging, alerting, and incident response for platform and production systems.
  • Improve platform reliability, scalability, availability, and cost efficiency.
  • Collaborate with software, data, AI/ML, and product teams to move solutions from development to production.jobs.welcome.
  • Enforce security, access control, compliance, and infrastructure standards in cloud environments.
  • Create technical documentation, deployment runbooks, and operational SOPs.

Required Skills

  • 4+ years of experience in DevOps, Platform Engineering, SRE, or MLOps-related roles.
  • Strong hands-on experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI/CD, or Azure DevOps.
  • Solid experience with Kubernetes, cluster operations, Helm, and container orchestration.
  • Good knowledge of Docker, Linux, Bash/Shell, and Python scripting for automation.
  • Experience with cloud platforms such as AWS, Azure, or GCP.
  • Knowledge of Infrastructure as Code tools such as Terraform or CloudFormation.
  • Strong understanding of production deployment, release engineering, rollback strategies, and environment separation.
  • Experience with observability tools such as Prometheus, Grafana, ELK, Datadog, or similar platforms.
  • Familiarity with security best practices, IAM, secrets management, and vulnerability remediation.

Preferred Skills

  • Experience supporting GenAI, LLMOps, MLOps, or AI platform workloads in production.
  • Familiarity with OpenAI, Azure OpenAI, Hugging Face, LangChain, LlamaIndex, or related AI tools/frameworks.
  • Exposure to MLflow, Databricks, model registry workflows, and model lifecycle management.
  • Experience with GitOps tools such as Argo CD or Flux.
  • Understanding of model monitoring, drift detection, and AI service reliability.

Lead-Level Expectations

For a Lead title, the role should also include team guidance, architecture ownership, cross-functional coordination, and operational standards for deployment and platform reliability. Comparable lead roles also emphasize mentoring engineers, improving engineering processes, and driving scalable platform strategy across AI and production systems.

  • Lead DevOps and platform architecture decisions for cloud-native and GenAI workloads.
  • Mentor junior engineers and review infrastructure, automation, and deployment practices.
  • Standardize CI/CD, release governance, observability, and security controls across teams.
  • Partner with engineering and AI teams to improve delivery speed and production resilience.



More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 147371109