We are looking for a hands-on Senior / Lead DevOps Engineer to build, automate, and scale modern cloud-native platforms for GenAI, AI/ML, and production applications. This role will own CI/CD pipelines, Kubernetes-based deployments, platform reliability, observability, and secure production releases across development, staging, and production environments.
Requirements
Role Overview
The role combines DevOps, platform engineering, and MLOps responsibilities, with a focus on deploying and maintaining AI/ML or GenAI workloads in production. Typical responsibilities in comparable roles include automating model and application deployment, integrating Kubernetes with CI/CD pipelines, monitoring production health, and improving scalability, reliability, and security.devsdata+2
Key Responsibilities
- Design, build, and maintain scalable CI/CD pipelines for application, API, and GenAI/ML workload deployments.
- Manage Kubernetes infrastructure across dev, test, staging, and production environments.
- Automate build, test, release, rollback, and deployment workflows using DevOps best practices.
- Deploy and support containerized services using Docker and Kubernetes.
- Enable production deployment of AI/ML models, GenAI applications, RAG pipelines, and related services.tiger-analytics.
- Implement monitoring, logging, alerting, and incident response for platform and production systems.
- Improve platform reliability, scalability, availability, and cost efficiency.
- Collaborate with software, data, AI/ML, and product teams to move solutions from development to production.jobs.welcome.
- Enforce security, access control, compliance, and infrastructure standards in cloud environments.
- Create technical documentation, deployment runbooks, and operational SOPs.
Required Skills
- 4+ years of experience in DevOps, Platform Engineering, SRE, or MLOps-related roles.
- Strong hands-on experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI/CD, or Azure DevOps.
- Solid experience with Kubernetes, cluster operations, Helm, and container orchestration.
- Good knowledge of Docker, Linux, Bash/Shell, and Python scripting for automation.
- Experience with cloud platforms such as AWS, Azure, or GCP.
- Knowledge of Infrastructure as Code tools such as Terraform or CloudFormation.
- Strong understanding of production deployment, release engineering, rollback strategies, and environment separation.
- Experience with observability tools such as Prometheus, Grafana, ELK, Datadog, or similar platforms.
- Familiarity with security best practices, IAM, secrets management, and vulnerability remediation.
Preferred Skills
- Experience supporting GenAI, LLMOps, MLOps, or AI platform workloads in production.
- Familiarity with OpenAI, Azure OpenAI, Hugging Face, LangChain, LlamaIndex, or related AI tools/frameworks.
- Exposure to MLflow, Databricks, model registry workflows, and model lifecycle management.
- Experience with GitOps tools such as Argo CD or Flux.
- Understanding of model monitoring, drift detection, and AI service reliability.
Lead-Level Expectations
For a Lead title, the role should also include team guidance, architecture ownership, cross-functional coordination, and operational standards for deployment and platform reliability. Comparable lead roles also emphasize mentoring engineers, improving engineering processes, and driving scalable platform strategy across AI and production systems.
- Lead DevOps and platform architecture decisions for cloud-native and GenAI workloads.
- Mentor junior engineers and review infrastructure, automation, and deployment practices.
- Standardize CI/CD, release governance, observability, and security controls across teams.
- Partner with engineering and AI teams to improve delivery speed and production resilience.