Job Description
Title Platform Grafana Engineer Cloud Observability
Summary
We are looking for an experienced Grafana Engineer to design and manage cloud native observability solutions. The role focuses on dashboards, alerting, and SLO and SLA reporting using Grafana and its ecosystem to provide actionable insights for platform and product teams.
Key Responsibilities
Design and implement observability solutions across metrics, logs, and traces using Grafana and Prometheus based systems.
Build and maintain dashboards, alerts, and SLO frameworks.
Integrate Grafana with cloud platforms such as AWS, Azure, or GCP and tools like CloudWatch and Azure Monitor.
Manage data sources including Prometheus, Loki, Tempo, and OpenSearch.
Automate deployments using Terraform, CloudFormation, Helm, and GitOps.
Support Kubernetes monitoring and cloud native workloads.
Ensure performance, scalability, security, and cost optimization of observability platforms.
Collaborate with engineering teams to improve monitoring, reliability, and incident response.
Required Skills
Strong hands on experience with Grafana and observability tools.
Experience with cloud platforms AWS, Azure, or GCP.
Knowledge of Prometheus, Loki, and OpenTelemetry.
Good understanding of Kubernetes and distributed systems.
Experience with Infrastructure as Code and CI and CD pipelines.
Strong troubleshooting and problem solving skills.
Preferred Skills
Experience with Kafka, Spark, Docker, and Kubernetes.
Knowledge of Terraform and CloudFormation.
Familiarity with NoSQL databases and monitoring tools.
Success Measures
Improved incident detection and resolution times.
Better dashboard adoption and alert quality.
Optimized cost and performance of observability systems.