Site Reliability Engineer

techwise digital

Hyderabad, India

Fresher

Save

Posted 17 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

We are looking for a skilled Site Reliability Engineer (SRE) with strong expertise in Google Cloud Platform (GCP) and Java-based applications. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of production systems, while driving automation and operational excellence.

Key Responsibilities

Ensure high availability, performance, and scalability of applications hosted on GCP.
Design, build, and maintain reliable and scalable infrastructure using SRE principles.
Monitor system health using tools like Stackdriver (Cloud Monitoring), Prometheus, Grafana.
Troubleshoot production issues across services, application layers, and infrastructure.
Collaborate with development teams to improve application reliability and performance (Java-based systems).
Implement CI/CD pipelines and automate deployments.
Develop and maintain runbooks, playbooks, and incident response processes.
Drive incident management, root cause analysis (RCA), and postmortems.
Optimize cost, performance, and resource utilization on GCP.
Implement observability, logging, tracing, and alerting frameworks.

Required Skills

Strong experience in Google Cloud Platform (GCP) services (Compute Engine, GKE, Cloud Run, BigQuery, Cloud Storage).
Proficiency in Java/J2EE applications and debugging production issues.
Experience with containerization (Docker) and orchestration tools like Kubernetes (GKE).
Hands-on experience in monitoring & logging tools (Prometheus, Grafana, ELK, Cloud Monitoring).

Skills: java,gcp,devops