Search by job, company or skills

K

Site Reliability Engineer (SRE) - Observability & Azure Infrastructure

5-8 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago
  • Be among the first 50 applicants
Early Applicant
Quick Apply

Job Description

Key Responsibilities

Observability Platform Implementation:

  • Design and maintain distributed tracing, metrics, and logging using OpenTelemetry, Prometheus, Loki, and Tempo.
  • Ensure complete instrumentation of .NET Core applications for end-to-end visibility. o Implement telemetry pipelines for application logs, performance metrics, and traces.
  • Monitoring & Alerting:
  • Develop and manage SLIs, SLOs, and error budgets.
  • Create actionable, noise-free alerts using Prometheus Alertmanager and Azure Monitor. o Monitor key infrastructure components, applications, and databases with a focus on reliability and performance. Azure & Infrastructure Integration:
  • Integrate Azure services (App Services, VMs, Storage, etc.) with the observability stack. o Configure monitoring for MSSQL databases, including performance tuning metrics and health indicators. o Use Azure Monitor, Log Analytics, and custom exporters where necessary.
  • Automation & DevOps:
  • Automate observability configurations using Terraform, PowerShell, or other IaC tools.
  • Integrate telemetry validation and health checks into CI/CD pipelines.
  • Maintain observability as code for repeatable deployments and easy scaling.
  • Resilience & Reliability Engineering:
  • Conduct capacity planning to anticipate scaling needs based on usage patterns and growth.
  • Define and implement disaster recovery strategies for critical Azure-hosted services and databases.
  • Perform load and stress testing to identify performance bottlenecks and validate infrastructure limits.
  • Support release engineering by integrating observability checks and rollback strategies in CI/CD pipelines.
  • Apply chaos engineering practices in lower environments to uncover potential reliability risks proactively. Collaboration & Documentation:
  • Partner with engineering teams to promote observability best practices in .NET Core development. o Create dashboards (Grafana preferred) and runbooks for system insights and incident response. o Document monitoring standards, troubleshooting guides, and onboarding materials.

Required Skills and Experience

  • 4+ years of experience in SRE, DevOps, or infrastructure-focused roles.
  • Deep experience with .NET Core application observability using OpenTelemetry.
  • Proficiency with Prometheus, Loki, Tempo, and related observability tools.
  • Strong background in Azure infrastructure monitoring, including App Services and VMs.
  • Hands-on experience monitoring MSSQL databases (deadlocks, query performance, etc.). Familiarity with Infrastructure as Code (Terraform, Bicep) and scripting (PowerShell, Bash).
  • Experience building and tuning alerts, dashboards, and metrics for production systems.

Preferred Qualifications

  • Azure certifications (e.g., AZ-104, AZ-400).
  • Experience with Grafana, Azure Monitor, and Log Analytics integration.
  • Familiarity with distributed systems and microservice architectures.
  • Prior experience in high-availability, regulated, or customer-facing environments.

More Info

Job Type:
Industry:
Function:
Employment Type:
Open to candidates from:
Indian

About Company

Keka has been a silent revolution in the making since our launch 7 years ago. Our steadfast focus on building an employee-centric HR platform was well received by more than 8500 businesses across India and the world. Today we are India s #1 platform in the segment with the greatest number of new customers adopting the platform. All with zero advertising spend and pure customer love. We are an organization built by our employees. The passion and the extreme ownership that our people bring to the table are contagious. We don t hide our shortcomings and we aren t afraid to ask for help. When we fail, we learn, adapt, and do better in the future. This open culture encourages our people to innovate, regardless of their function and across departmental boundaries.
e are looking for Associate Product Marketing Managers (APMMs) to join our team. Our APMM will be a superb communicator and manage product launches, competitive intelligence, and sales enablement to grow revenue and drive product adoption.
You ll be embedded in our small, but mighty PMM team based out of Bangalore / Hyderabad and partnering with Keka s senior leadership, product, sales, and partnerships teams.

Job ID: 121400835