Search by job, company or skills

LTIMindtree

Observability Platform Lead (Grafana-Centric)

Save
new job description bg glownew job description bg glow
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Hi

This Is Sumana

We Are Hiring For Below Primary Skills

Please Share your Resumes For Below Mail, If Any Interested Candidates

Location : Coimbatore, Bangalore, Chennai,Hyderabad,Pune,Mumbai

Primary Skills : Observability Platform Lead (Grafana-Centric)

Notice Period : Immediate to 15 Days only

Experience : 5 To 8 Years

Email : [Confidential Information]

Job Description: Observability Platform Lead (Grafana-Centric) :5-8 Years

Role Summary

We are seeking a Senior Observability Platform Engineer (7+ years) with deep specialization in the Grafana ecosystem and end-to-end observability platform engineering. This role owns the design, development, and scaling of a Grafana-first monitoring and observability stack, including multi-level integrations across telemetry backends and custom plugin development.

The engineer will also work hands-on with Golang-based services and integrations and leverage PostgreSQL for platform metadata, configuration, and plugin/backend persistence.

Key Responsibilities

1) Grafana Platform Engineering (Primary Focus)

  • Design, deploy, and operate Grafana at scale (HA setups, performance tuning, RBAC, SSO/OIDC, multi-tenant patterns).
  • Own Grafana provisioning-as-code (datasources, dashboards, alerting rules, contact points).
  • Design and enforce enterprise dashboarding and alerting standards.
  • Build scalable Grafana Alerting pipelines with noise reduction, deduplication, and on-call integrations.

2) Observability Stack Engineering (Metrics, Logs, Traces)

  • Architect and operate telemetry pipelines across:
  • Metrics: Prometheus / Mimir / Cortex
  • Logs: Loki / Elastic / OpenSearch / LogScale
  • Traces: Tempo / Jaeger
  • Design high-signal dashboards (Golden Signals, RED/USE, SLO views).
  • Implement OpenTelemetry instrumentation and collector pipelines.

3) Multi-Level Integrations with Data Sources

  • Build deep integrations between Grafana and:
  • Kubernetes and container platforms
  • Cloud-native services and managed monitoring systems
  • Databases, messaging systems, and custom internal platforms
  • Enable metrics–logs–traces correlation via consistent labels, exemplars, and trace IDs.
  • Implement consistent metadata strategies (service, environment, region, tenant).

4) Custom Grafana Plugin Development

  • Design and build custom Grafana plugins, including:
  • Data source plugins for proprietary or internal systems
  • Panel plugins for domain-specific visualizations
  • App plugins for bundled observability experiences
  • Develop plugins using TypeScript/React (Grafana UI framework) and backend services as needed.
  • Ensure plugin security, performance, versioning, signing, and backward compatibility.

5) Golang-Based Backend & Integration Development

  • Design and develop Golang-based services that support:
  • Custom telemetry ingestion or transformation
  • Backend components for Grafana plugins
  • Observability platform controllers, operators, or APIs
  • Build performant REST/gRPC APIs for integration with Grafana and telemetry systems.
  • Apply Go concurrency, memory, and performance best practices.

6) PostgreSQL Usage & Platform Persistence

  • Use PostgreSQL for:
  • Plugin backend persistence
  • Observability platform metadata/configuration storage
  • Feature state, tenancy, or dashboard governance data
  • Design efficient schemas, indexes, and queries for scale.
  • Ensure backup, migration, performance tuning, and access control best practices.

7) Security, Reliability & Operations

  • Implement secure access patterns: SSO (OIDC/SAML), token-based auth, secrets management.
  • Operate Grafana and observability components with SLO-driven reliability targets.
  • Handle upgrades, incidents, and capacity planning for the platform itself.
  • Establish governance for alert quality, dashboard ownership, and lifecycle management.

8) Enablement & Best Practices

  • Build self-service onboarding paths for application and platform teams.
  • Provide reusable dashboards, alert templates, and instrumentation guidelines.
  • Coach teams on observability maturity and signal quality improvements.

Required Skills & Experience (Must-Have)

Experience

  • 7+ years in platform engineering, SRE, or observability engineering roles.
  • Strong hands-on ownership of Grafana as a platform, not just dashboard usage.

Grafana & Observability

  • Advanced knowledge of:
  • Grafana HA, provisioning, RBAC, alerting internals
  • Prometheus data model and query optimization
  • OpenTelemetry concepts and pipelines

Golang

  • Production-grade Golang development experience
  • Building APIs, background workers, or controllers used in observability platforms
  • Familiarity with Go tooling, testing, and CI/CD pipelines

PostgreSQL

  • Hands-on experience with PostgreSQL in production
  • Data modeling, indexing, query optimization, migrations, and backups
  • Using PostgreSQL as a backend for platform services or plugins

Plugin Development (Critical)

  • Proven experience developing custom Grafana plugins
  • Strong TypeScript/React skills aligned with Grafana plugin SDK
  • Understanding of plugin signing, version compatibility, and performance constraints

Good-to-Have / Preferred

  • Grafana Agent / Alloy and OpenTelemetry Collector tuning at scale
  • Multi-tenant observability system design
  • Cost optimization for metrics/logs retention and sampling
  • Kubernetes operators/controllers written in Go
  • Incident management integrations (PagerDuty, ServiceNow, Opsgenie)

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147478315