Search by job, company or skills

Crest Data Systems

Sr. Site Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 21 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Summary:

Experienced Systems Administrator with a strong foundation in Linux, infrastructure management, and incident response, skilled in monitoring, troubleshooting, and maintaining reliable systems across virtualized and cloud-based environments.

Job Responsibilities

  • Manage and optimize Linux systems with focus on performance, reliability, and troubleshooting.
  • Handle network issues including latency, packet drops, and connectivity
  • Work on cloud platforms (AWS/GCP/Azure) for deployment and scaling
  • Deploy and manage applications using Docker and Kubernetes (cluster troubleshooting & scaling)
  • Build and maintain monitoring systems using Prometheus, Grafana, and ELK
  • Create dashboards, alerts, and PromQL queries
  • Automate tasks using Python/Bash scripting
  • Manage CI/CD pipelines (Jenkins/GitLab CI)
  • Handle P1/P2 incidents, lead bridges, and perform RCA

Key Skills

  • Strong Linux fundamentals.
  • Good understanding of networking (TCP/IP, DNS, HTTP/HTTPS, load balancing)
  • Hands-on experience with Docker & Kubernetes (must-have)
  • Experience with cloud platforms (AWS/GCP/Azure)
  • Knowledge of monitoring tools (Prometheus, Grafana, ELK)
  • Proficiency in Python or Bash scripting
  • Experience in CI/CD tools (Jenkins/GitLab CI)
  • Strong incident management and troubleshooting skills.

Good to Have:

  • Exposure to Terraform or Ansible

Qualifications:

  • Bachelor's degree in Computer Science, Engineering (BE/B.Tech), MCA, or M.Sc (IT).

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 145743703