
Search by job, company or skills
Job Title: Site Reliability Engineer (SRE) New Relic Monitoring Specialist
Location: Chennai/Bangalore
Job Type: Full-Time
Role Overview
We are seeking an experienced Site Reliability Engineer (SRE) with deep expertise in New Relic monitoring and observability. The role involves designing, implementing, and managing end-to-end monitoring solutions using New Relic to ensure system reliability, proactive incident detection, and optimized application performance.
Key Responsibilities
Monitoring & Observability (New Relic Focus)
Configure and manage New Relic APM, Infrastructure, Browser, and Synthetic Monitoring for applications and services.
Develop and maintain custom dashboards, alerts, and health checks in New Relic to track system performance and business KPIs.
Continuously enhance observability strategies using New Relic Insights, Distributed Tracing, and Logs for proactive issue detection.
Incident Management & Troubleshooting
Analyze New Relic metrics, traces, and logs to identify latency, errors, and bottlenecks.
Collaborate with cross-functional teams to troubleshoot incidents and perform root cause analysis using New Relic data.
Optimize alerting mechanisms in New Relic to reduce noise and improve incident response efficiency.
Automation & CI/CD Integration
Automate monitoring configurations and alert setups using New Relic APIs and Terraform.
Integrate New Relic with CI/CD pipelines for real-time deployment visibility.
Link New Relic alerts with incident management platforms like ServiceNow, PagerDuty, or Slack.
Operational Excellence & Reliability Engineering
Implement SRE best practices leveraging New Relic for reliability, scalability, and performance optimization.
Ensure adherence to SLAs and SLOs using New Relic Service Level Management features.
Documentation & Knowledge Sharing
Create and maintain runbooks for New Relic monitoring workflows.
Document incident handling strategies and educate teams on New Relic dashboards and troubleshooting techniques.
Core Competencies
Strong hands-on experience with New Relic APM, Infrastructure, Logs, and Distributed Tracing.
Expertise in creating custom dashboards, alerts, and automation scripts for New Relic.
Ability to analyze application performance metrics and traces for issue resolution.
Familiarity with CI/CD pipelines, ITSM tools (ServiceNow, PagerDuty), and cloud environments (AWS, Azure, GCP).
Knowledge of container orchestration (Kubernetes, Docker) and infrastructure automation.
Strong analytical and problem-solving skills focused on improving system resilience.
Preferred Skills
Experience with New Relic APIs and Terraform for IaC.
Knowledge of SRE principles and operational excellence frameworks.
Excellent communication and documentation skills.
Job ID: 135949157