Primary & Mandatory Skill: DevOps, Kubernetes cluster administration, Containerization & orchestration ,CICD Integration, Docker, OpenShift in on-prem environments, Scripting.
Location: Chennai & Bangalore
Key Responsibilities:
Kubernetes Cluster Administration
- Administer, monitor, and maintain production-grade Kubernetes clusters deployed in on-prem datacentre.
- Perform cluster lifecycle operations including upgrades, patching, node provisioning, and capacity planning.
- Implement and manage RBAC, network policies, Pod-Security Policies, and Namespaces for multi-tenant environments.
- Maintain Ingress controllers, service meshes, and API gateways.
- Troubleshoot cluster-level issues including node failures, pod scheduling, and resource bottlenecks.
Containerization & Orchestration
- Build and manage Docker images, Compose files, and private registries.
- Deploy and orchestrate microservices using Kubernetes, Helm charts, and Red Hat OpenShift.
- Optimize container resource usage, autoscaling policies, and affinity/anti-affinity rules.
CI/CD Integration
- Design and maintain CI/CD pipelines using DevOps tools
- Automate deployment of containerized AI applications into Kubernetes clusters.
- Develop reusable pipeline templates and scripts for rapid onboarding and POC delivery.
AI Workflow Enablement (ClearML)
- Integrate ClearML for experiment tracking, model versioning, and pipeline orchestration.
- Collaborate with AI/ML teams to deploy containerized models and automate GPU job scheduling.
- Build custom ClearML agents and workflows for reproducible experimentation and deployment.
- Exposure and prior knowledge of GPU segmentation & resource management using BCM & NVAIE is desirable.
- Understanding/knowledge of Vector DBs would be good to have.
Scripting & Tooling
- Develop automation scripts in Shell, Python
- Build internal tools to streamline cluster operations and observability.