Search by job, company or skills

gspann technologies, inc

Site Reliability Engineer - Datadog

new job description bg glownew job description bg glownew job description bg svg
  • Posted 10 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Site Reliability Engineering (SRE), Datadog, Splunk, Grafana, Continuous Integration / Continuous Delivery (CI/CD), Amazon Web Services, Microsoft Azure, Google Cloud Platform

Description

GSPANN is hiring a Site Reliability Engineer with an expertise in Datadog to design and manage enterprise observability and monitoring solutions. The role focuses on improving system reliability, implementing SLO-driven practices, and driving automation across cloud and distributed environments.

Location: Hyderabad

Role Type: Full Time

Published On: 31 March 2026

Experience: 5+ Years

Share this job

Description

GSPANN is hiring a Site Reliability Engineer with an expertise in Datadog to design and manage enterprise observability and monitoring solutions. The role focuses on improving system reliability, implementing SLO-driven practices, and driving automation across cloud and distributed environments.

Role and Responsibilities

  • Design, implement, and maintain monitoring, logging, and distributed tracing solutions using Datadog.
  • Build Service Level Objective (SLO), Service Level Agreement (SLA), and status dashboards to provide real-time visibility into system health and performance.
  • Collaborate with engineering, infrastructure, and business teams to integrate observability practices into applications and platforms.
  • Identify gaps in monitoring coverage and recommend improvements to enhance visibility and reliability.
  • Drive automation for efficient collection, storage, and analysis of observability data.
  • Support incident response activities, perform root cause analysis (RCA), and contribute to problem management processes.
  • Establish and enforce best practices for system reliability and monitoring standards.
  • Balance operational support responsibilities with strategic reliability and performance improvement initiatives.
  • Analyze system trends to proactively prevent incidents and performance degradation.
  • Recommend and implement solutions to improve system reliability, scalability, and resilience.
  • Stay updated with industry trends in Site Reliability Engineering (SRE) and observability practices.
  • Mentor junior engineers and promote a culture of continuous learning and improvement.

Skills And Experience

  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
  • 5–8+ years of experience in Software Engineering, Site Reliability Engineering (SRE), or operations roles with a strong focus on observability.
  • Demonstrate strong hands-on experience with Datadog, including Application Performance Monitoring (APM), Infrastructure Monitoring, log management, Real User Monitoring (RUM), and Synthetic Monitoring.
  • Work with logging platforms, metrics collection systems, and distributed tracing frameworks.
  • Define and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets effectively.
  • Apply strong analytical and troubleshooting skills to diagnose and resolve complex system issues.
  • Communicate effectively and collaborate with cross-functional teams.
  • Drive automation initiatives to improve system reliability and operational efficiency.
  • Utilize additional monitoring and observability tools such as Splunk, Grafana, AppDynamics, and Prometheus.
  • Work with cloud platforms including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).
  • Apply hands-on experience with Kubernetes and container-based observability.
  • Implement Infrastructure as Code (IaC) practices using tools such as Terraform and AWS CloudFormation.
  • Integrate observability practices into Continuous Integration / Continuous Delivery (CI/CD) pipelines.
  • Hold relevant certifications such as Certified Kubernetes Administrator (CKA) or Terraform Associate (preferred).

More Info

Job Type:
Industry:
Employment Type:

Job ID: 145648169