Site Reliability Engineer

nexionpro services

Hyderabad, India

5-10 Years

Save

Posted 21 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

SRE Observability Developer

Location: Hyderabad | Exp: 5–10 Years | Focus: Observability-as-Code & Automation

Role Overview

We are hiring an SRE Engineer to mature the observability and RCA capabilities of our high-scale UPI payment platforms. This is a hands-on, code-driven role focused on building reliable telemetry pipelines, transaction correlation, and automated alerting frameworks. You will treat monitoring configurations as code to ensure consistent, scalable operational intelligence.

Key Responsibilities

Telemetry Standardization: Build and standardize metrics, logs, and traces across app, middleware, and infra layers. Implement custom tags/attributes for unified drill-down dashboards.
Transaction Correlation: Enable correlation for asynchronous UPI flows to provide end-to-end visibility across distributed services.
SLO & Alert Engineering: Define Golden Signals and SLIs for critical journeys (P2P, P2M). Implement Alert-as-Code using config-based anomaly detection and noise-reduction logic.
Observability-as-Code: Automate the provisioning of Grafana dashboards, alert rules, and collector configurations (Otel/Fluentd) using version-controlled scripts.
RCA & Intelligence: Build RCA-focused views for Redis, Kafka, YugabyteDB, and Nginx. Use synthetic monitoring and black-box exporters to gain visibility into partially controlled systems.
Operational Integration: Convert incident learnings into automated telemetry patterns. Embed observability validation into deployment and release workflows.

Must-Have Skills

1. Observability Stack

Expertise: Prometheus/Victoria Metrics, Victoria Logs/Traces, OpenTelemetry (OTel), and Fluentd.
Tooling: Advanced Grafana, Alertmanager, and various infrastructure exporters.
Development: Ability to develop Custom Exporters using OpenTelemetry SDKs for unique business/transaction metrics.

2. Systems & Middleware

Knowledge: Deep understanding of Redis, Kafka, Nginx, and YugabyteDB (or similar distributed DBs).
App Tier: Proficiency with JVM/Spring Boot Actuator metrics and asynchronous request/response patterns.
Environment: Experience with high-scale, low-latency platforms; UPI/Payments domain is highly preferred.

3. Scripting & Automation

Core Skills: Strong Python and Shell/Bash for automating telemetry validation and collector lifecycle management.
Mindset: Ability to treat all monitoring assets (dashboards, rules, configs) as code artifacts.

What We're Looking For

An engineer who sees a dashboard as a product of code, not just a UI task.
Strong debugging skills across complex, on-prem distributed systems.
The ability to bridge the gap between what happened and where the code failed through advanced correlation.

More Info

Job Type:

Permanent Job

Industry:

Other

Function:

Sre (Site Reliability Engineering)

Employment Type:

Full time

About Company

nexionpro servicesJob Source: www.linkedin.com

Job ID: 148892661

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 10-06-2026 09:26:31 AM

Homejobs in Hyderabad / Secunderabad, TelanganaSite Reliability Engineer

Similar Jobs

Associate Vice President - Senior Lead Site Reliability Engineer

deutsche borse group

9-11 yrs

Hyderabad, India

Skills:

Java, Microservice architecture, Mq, PostgreSQL, Spring Boot, Kafka, JIRA, Jenkins, Gcp, Openshift, Kubernetes, logging tools, OAQ, Tekton, Chaos Engineering concepts, private public key management, GitHub Actions, Cloud WAF security, Camunda process orchestration engine

Site Reliability Engineer Operations

RealPage

5-7 yrs

Hyderabad, India

Skills:

Windows Services, Windows Server, Gcp, Elk, Linux, Iis, PowerShell, Azure, Python, AWS, Active Directory

Lead Site Reliability Engineer - Observability

SimCorp

5-7 yrs

Hyderabad, India

Skills:

Celery, Docker, Terraform, Cosmos DB, Postgres Sql, PowerShell, Bash, Itil, Datadog, Sql, Arm, Kubernetes, Checkly, Log Analytics, OpenTelemetry, OpenAI APIs, Bicep, Application Insights, LangChain, Microsoft Azure Cloud, AI ML-based anomaly detection, Playwright, Kusto, Azure Monitor

Site Reliability Engineer

Billtrust

5-7 yrs

Hyderabad, India

Skills:

Incident Response, AWS, Shell scripting, Python, Bash, Kubernetes, Go, Container orchestration, Linux Unix systems administration, Infrastructure automation, Site Reliability Engineering

Senior Site Reliability Engineer I

Electronic Arts (EA)

12-15 yrs

Hyderabad, India

Skills:

Elk, PowerShell, Prometheus, Bash, Grafana, Datadog, Zabbix, Gcp, Docker, Terraform, Ansible, Splunk, Nagios, Puppet, Azure, Kubernetes, Python, AWS, Chef, Linux Unix system administration, Go, Istio

Do you want to see more relevant and perfect job for you?

Beware of Scammers

We don’t charge any money for job offers

What it feels like to have

48% more interview calls?

To get 5X more recruiter views on your profile

Real-time notifications

Discover new jobs, get recruiter notifications, track applications & more with the foundit App.

Scan to download foundit App