Search by job, company or skills

techwise digital

Site Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 17 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are looking for a skilled Site Reliability Engineer (SRE) with strong expertise in Google Cloud Platform (GCP) and Java-based applications. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of production systems, while driving automation and operational excellence.

Key Responsibilities

  • Ensure high availability, performance, and scalability of applications hosted on GCP.
  • Design, build, and maintain reliable and scalable infrastructure using SRE principles.
  • Monitor system health using tools like Stackdriver (Cloud Monitoring), Prometheus, Grafana.
  • Troubleshoot production issues across services, application layers, and infrastructure.
  • Collaborate with development teams to improve application reliability and performance (Java-based systems).
  • Implement CI/CD pipelines and automate deployments.
  • Develop and maintain runbooks, playbooks, and incident response processes.
  • Drive incident management, root cause analysis (RCA), and postmortems.
  • Optimize cost, performance, and resource utilization on GCP.
  • Implement observability, logging, tracing, and alerting frameworks.

Required Skills

  • Strong experience in Google Cloud Platform (GCP) services (Compute Engine, GKE, Cloud Run, BigQuery, Cloud Storage).
  • Proficiency in Java/J2EE applications and debugging production issues.
  • Experience with containerization (Docker) and orchestration tools like Kubernetes (GKE).
  • Hands-on experience in monitoring & logging tools (Prometheus, Grafana, ELK, Cloud Monitoring).

Skills: java,gcp,devops

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 145312803