Role Summary
We are seeking an experienced OpenShift & DevOps Platform Lead/Manager to lead the design, implementation, automation, and operations of enterprise container platforms based on Red Hat OpenShift. The candidate will be responsible for managing OpenShift clusters, leading DevOps transformation, building CI/CD pipelines, and managing a team supporting mission-critical production environments.
This role requires deep expertise in Kubernetes/OpenShift, automation, CI/CD, infrastructure as code, and strong leadership experience managing production platform teams.
Key Responsibilities:
Platform Ownership & Leadership
- Lead and manage OpenShift platform operations across Production, DR, and Non-Production environments
- Manage and mentor DevOps and OpenShift engineers (L2/L3 support)
- Define platform standards, governance, and best practices
- Ensure platform availability, scalability, and performance
- Drive DevOps transformation and automation initiatives
OpenShift Cluster Management
- Design, deploy, configure, and manage Red Hat OpenShift clusters (OCP 4.16 and above)
- Manage cluster components including:
- Control plane and worker nodes
- Operators and CRDs
- Ingress, Routes, and Load Balancers
- Persistent storage and volumes
- Perform cluster upgrades, patching, and lifecycle management
- Troubleshoot cluster, node, and application issues
- Ensure cluster security, compliance, and performance
DevOps & CI/CD Implementation
- Design and manage CI/CD pipelines using tools such as:
- Jenkins
- GitLab CI/CD
- Azure DevOps
- GitHub Actions
- Automate application build, test, and deployment processes
- Implement GitOps workflows using ArgoCD or similar tools
- Enable automated deployments to OpenShift clusters
Infrastructure Automation & IaC
- Implement Infrastructure as Code using:
- Ansible
- Terraform
- Helm charts
- Automate cluster provisioning, configuration, and scaling
- Automate operational tasks such as patching, deployment, and configuration
Containerization & Kubernetes Management
- Support containerized workloads using Docker / Podman
- Manage Kubernetes objects including:
- Pods
- Deployments
- StatefulSets
- Services
- ConfigMaps and Secrets
- Troubleshoot container and orchestration issues
Incident Management & Platform Reliability
- Own platform availability and reliability
- Lead critical incident resolution (P1/P2)
- Drive Root Cause Analysis (RCA) and preventive measures
- Improve MTTR, uptime, and platform stability
- Ensure SLA and SLO compliance
Security & Compliance
- Implement RBAC, security policies, and access controls
- Manage secrets, certificates, and secure deployments
- Ensure platform compliance with security and audit requirements
- Implement vulnerability management and patching