Search by job, company or skills

Landmark Group

Site Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 13 days ago
  • Over 50 applicants

Job Description

What You'll Do:

Ensure reliability and high availability of Java and microservices-based applications through proactive monitoring and automation.

Define and track SLIs/SLOs to maintain service performance and stability.

Troubleshoot and resolve production issues, performing detailed root cause analysis to prevent recurrence.

Build and enhance observability using Prometheus, Grafana, Loki, or New Relic.

Automate operational tasks deployments, scaling, rollbacks, diagnostics, and alerting.

Collaborate with engineering and DevOps teams to integrate reliability practices into the CI/CD pipeline.

Drive AIOps initiatives for intelligent alert correlation and predictive incident management.

Mentor teams on best practices in monitoring, performance optimization, and operational efficiency.

What We're Looking For:

36 years of experience in Site Reliability Engineering, Application Operations, or DevOps.

Strong hands-on experience with Java, Spring Boot, and microservices architecture.

Proficiency in monitoring tools (Prometheus, Grafana, Loki, New Relic, or similar).

Experience with Kubernetes, containers, and cloud platforms (AWS, Azure, or GCP).

Strong scripting skills in Bash, Python, or Go for automation and diagnostics.

Familiar with incident management, RCA, and performance debugging.

Exposure to AIOps tools or AI/LLM-based observability platforms is a plus.

Excellent problem-solving and communication skills.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 133297879

Similar Jobs