Key Responsibilities:
- OpenShift Cluster Management: Administer and manage multiple OpenShift clusters across different environments (on-premise and cloud), ensuring high availability, stability, and scalability.
- Troubleshooting & Incident Resolution: Provide L2 support for OpenShift-related incidents, resolve issues escalated from L1 support, and investigate root causes of cluster or container failures.
- Cluster Provisioning: Deploy and configure new OpenShift environments, ensuring they are aligned with organizational standards and optimized for performance and security.
- Container Management: Assist developers in deploying applications using Docker containers on OpenShift, ensuring proper environment configuration and resource allocation.
- Performance Tuning & Optimization: Monitor cluster performance, optimize resources, and ensure efficient utilization of computing, storage, and networking resources.
- Security & Compliance: Enforce security best practices, including configuring role-based access control (RBAC), security policies, and ensuring compliance with organizational and industry security standards.
- Log Management: Manage and analyze logs and metrics using OpenShift tools (e.g., OC logs, Prometheus, Grafana) to identify performance bottlenecks and troubleshoot issues.
- Patch Management: Regularly update and patch OpenShift clusters, ensuring they are up to date with the latest features, patches, and security updates.
- Backup & Disaster Recovery: Ensure robust backup and disaster recovery strategies for OpenShift deployments to minimize downtime and data loss.
- Automation & Scripting: Automate routine tasks using Ansible, Bash, or Python scripts to improve efficiency and reduce manual intervention.
- CI/CD Pipeline Support: Support and integrate CI/CD pipelines for continuous delivery and deployment, working closely with developers to ensure smooth code deployment using tools like Jenkins or GitLab.
- Documentation & Reporting: Maintain comprehensive documentation on cluster configurations, processes, troubleshooting procedures, and best practices. Provide regular reports on system performance, incidents, and upgrades.
- Collaboration with L3 Team: Work with the L3 support team on complex issues, providing feedback, contributing to root cause analysis, and suggesting improvements to the infrastructure.
- Training & Knowledge Sharing: Provide training to L1 and junior engineers on OpenShift administration, container orchestration, and best practices.
Required Qualifications & Skills:
- 3-5 years of hands-on experience with OpenShift administration and management.
- Strong knowledge of OpenShift 3.x/4.x architecture and components (e.g., Pods, Nodes, Deployments, Services, Ingress Controllers).
- Experience with Kubernetes (since OpenShift is built on Kubernetes) and a good understanding of Kubernetes clusters, containers, and container orchestration.
- Expertise in Linux-based systems (RHEL, CentOS, Ubuntu) as OpenShift primarily runs on Linux.
- Familiarity with containerization tools like Docker and container registries.
- Solid understanding of CI/CD tools (e.g., Jenkins, GitLab CI, OpenShift Pipelines) and deployment automation in a containerized environment.
- Experience with networking concepts like DNS, Load Balancing, Network Policies, and Ingress controllers.
- Familiarity with Ansible, Helm, or similar automation tools for managing OpenShift clusters.
- Experience with Prometheus for monitoring and Grafana for visualization in the OpenShift environment.
- Hands-on experience with role-based access control (RBAC) and security policies in OpenShift.
- Familiarity with cloud platforms like AWS, Azure, or GCP, and the deployment of OpenShift on these platforms.
- Strong problem-solving, troubleshooting, and debugging skills.
- Ability to work in a fast-paced environment, manage multiple tasks, and prioritize effectively.
- Strong written and verbal communication skills for interacting with developers, L1 support, and cross-functional teams.