Job Summary:
Experienced Systems Administrator with a strong foundation in Linux, infrastructure management, and incident response, skilled in monitoring, troubleshooting, and maintaining reliable systems across virtualized and cloud-based environments.
Job Responsibilities
- Manage and optimize Linux systems with focus on performance, reliability, and troubleshooting.
- Handle network issues including latency, packet drops, and connectivity
- Work on cloud platforms (AWS/GCP/Azure) for deployment and scaling
- Deploy and manage applications using Docker and Kubernetes (cluster troubleshooting & scaling)
- Build and maintain monitoring systems using Prometheus, Grafana, and ELK
- Create dashboards, alerts, and PromQL queries
- Automate tasks using Python/Bash scripting
- Manage CI/CD pipelines (Jenkins/GitLab CI)
- Handle P1/P2 incidents, lead bridges, and perform RCA
Key Skills
- Strong Linux fundamentals.
- Good understanding of networking (TCP/IP, DNS, HTTP/HTTPS, load balancing)
- Hands-on experience with Docker & Kubernetes (must-have)
- Experience with cloud platforms (AWS/GCP/Azure)
- Knowledge of monitoring tools (Prometheus, Grafana, ELK)
- Proficiency in Python or Bash scripting
- Experience in CI/CD tools (Jenkins/GitLab CI)
- Strong incident management and troubleshooting skills.
Good to Have:
- Exposure to Terraform or Ansible
Qualifications:
- Bachelor's degree in Computer Science, Engineering (BE/B.Tech), MCA, or M.Sc (IT).