Hi
This Is Sumana
We Are Hiring For Below Primary Skills
Please Share your Resumes For Below Mail, If Any Interested Candidates
Location : Coimbatore, Bangalore, Chennai,Hyderabad,Pune,Mumbai
Primary Skills : Observability Platform Lead (Grafana-Centric)
Notice Period : Immediate to 15 Days only
Experience : 5 To 8 Years
Email : [Confidential Information]
Job Description: Observability Platform Lead (Grafana-Centric) :5-8 Years
Role Summary
We are seeking a Senior Observability Platform Engineer (7+ years) with deep specialization in the Grafana ecosystem and end-to-end observability platform engineering. This role owns the design, development, and scaling of a Grafana-first monitoring and observability stack, including multi-level integrations across telemetry backends and custom plugin development.
The engineer will also work hands-on with Golang-based services and integrations and leverage PostgreSQL for platform metadata, configuration, and plugin/backend persistence.
Key Responsibilities
1) Grafana Platform Engineering (Primary Focus)
- Design, deploy, and operate Grafana at scale (HA setups, performance tuning, RBAC, SSO/OIDC, multi-tenant patterns).
- Own Grafana provisioning-as-code (datasources, dashboards, alerting rules, contact points).
- Design and enforce enterprise dashboarding and alerting standards.
- Build scalable Grafana Alerting pipelines with noise reduction, deduplication, and on-call integrations.
2) Observability Stack Engineering (Metrics, Logs, Traces)
- Architect and operate telemetry pipelines across:
- Metrics: Prometheus / Mimir / Cortex
- Logs: Loki / Elastic / OpenSearch / LogScale
- Traces: Tempo / Jaeger
- Design high-signal dashboards (Golden Signals, RED/USE, SLO views).
- Implement OpenTelemetry instrumentation and collector pipelines.
3) Multi-Level Integrations with Data Sources
- Build deep integrations between Grafana and:
- Kubernetes and container platforms
- Cloud-native services and managed monitoring systems
- Databases, messaging systems, and custom internal platforms
- Enable metrics–logs–traces correlation via consistent labels, exemplars, and trace IDs.
- Implement consistent metadata strategies (service, environment, region, tenant).
4) Custom Grafana Plugin Development
- Design and build custom Grafana plugins, including:
- Data source plugins for proprietary or internal systems
- Panel plugins for domain-specific visualizations
- App plugins for bundled observability experiences
- Develop plugins using TypeScript/React (Grafana UI framework) and backend services as needed.
- Ensure plugin security, performance, versioning, signing, and backward compatibility.
5) Golang-Based Backend & Integration Development
- Design and develop Golang-based services that support:
- Custom telemetry ingestion or transformation
- Backend components for Grafana plugins
- Observability platform controllers, operators, or APIs
- Build performant REST/gRPC APIs for integration with Grafana and telemetry systems.
- Apply Go concurrency, memory, and performance best practices.
6) PostgreSQL Usage & Platform Persistence
- Use PostgreSQL for:
- Plugin backend persistence
- Observability platform metadata/configuration storage
- Feature state, tenancy, or dashboard governance data
- Design efficient schemas, indexes, and queries for scale.
- Ensure backup, migration, performance tuning, and access control best practices.
7) Security, Reliability & Operations
- Implement secure access patterns: SSO (OIDC/SAML), token-based auth, secrets management.
- Operate Grafana and observability components with SLO-driven reliability targets.
- Handle upgrades, incidents, and capacity planning for the platform itself.
- Establish governance for alert quality, dashboard ownership, and lifecycle management.
8) Enablement & Best Practices
- Build self-service onboarding paths for application and platform teams.
- Provide reusable dashboards, alert templates, and instrumentation guidelines.
- Coach teams on observability maturity and signal quality improvements.
Required Skills & Experience (Must-Have)
Experience
- 7+ years in platform engineering, SRE, or observability engineering roles.
- Strong hands-on ownership of Grafana as a platform, not just dashboard usage.
Grafana & Observability
- Advanced knowledge of:
- Grafana HA, provisioning, RBAC, alerting internals
- Prometheus data model and query optimization
- OpenTelemetry concepts and pipelines
Golang
- Production-grade Golang development experience
- Building APIs, background workers, or controllers used in observability platforms
- Familiarity with Go tooling, testing, and CI/CD pipelines
PostgreSQL
- Hands-on experience with PostgreSQL in production
- Data modeling, indexing, query optimization, migrations, and backups
- Using PostgreSQL as a backend for platform services or plugins
Plugin Development (Critical)
- Proven experience developing custom Grafana plugins
- Strong TypeScript/React skills aligned with Grafana plugin SDK
- Understanding of plugin signing, version compatibility, and performance constraints
Good-to-Have / Preferred
- Grafana Agent / Alloy and OpenTelemetry Collector tuning at scale
- Multi-tenant observability system design
- Cost optimization for metrics/logs retention and sampling
- Kubernetes operators/controllers written in Go
- Incident management integrations (PagerDuty, ServiceNow, Opsgenie)