Skill: SRE Lead Tools Engineer Role
Location : Kochi /Bangalore
EXP : 5- 10 Years
Key Responsibilities.
- SRE Fundamentals & Reliability Engineering.
- Apply core SRE principles:
- SLIs, SLOs, and SLAs definition and governance.
- Error budgets and reliability trade-offs.
- Incident management and postmortems.
- Collaborate with SRE L2/L3 teams to improve system reliability and performance.
- Drive reduction in MTTR and enhance proactive detection strategies.
- Observability Strategy & Tool Recommendation (Core Responsibility).
- Act as the central expert for Splunk and Dynatrace capabilities.
- Analyze requirements from:
- Application developers.
- SRE L2/L3 engineers.
- Evaluate and determine:
- Whether to use Splunk, Dynatrace, or both.
- The most efficient, scalable, and cost-effective approach.
- Translate business and technical requirements into tool-specific solutions.
- Recommend best practices, architecture, and design patterns.
- Continuously evaluate new features and enhancements.
- Splunk Engineering.
- Design and optimize logging and monitoring solutions.
- Develop advanced SPL queries, dashboards, and alerts.
- Define log onboarding strategies and data models.
- Ensure data quality, governance, and cost efficiency.
- Provide guidance on effective Splunk usage.
- Dynatrace Expertise.
- Configure and optimize Dynatrace (APM, RUM, synthetic monitoring).
- Leverage AI-driven anomaly detection and root cause analysis.
- Map business transactions and critical user journeys.
- Guide teams on best practices and tool utilization.
- Azure Observability.
- Implement and integrate monitoring solutions in Microsoft Azure.
- Work with:
- Azure App Services.
- AKS.
- Azure Functions.
- Azure Monitor, Log Analytics, Application Insights.
- Ensure seamless integration across Azure, Splunk, and Dynatrace.
- Automation & Enablement.
- Develop automation scripts (Python, PowerShell, Bash).
- Enable self-service observability for engineering teams.
- Integrate tools with ServiceNow, Jira, or similar platforms.
- Provide documentation, standards, and reusable templates.
- Collaboration & Advisory.
- Act as a trusted advisor to developers and SRE teams.
- Conduct requirement intake and solution design sessions.
- Provide training on observability best practices.
- Drive adoption of standardized monitoring approaches.
Required Qualifications.
- 5+ years in SRE, DevOps, or Observability Engineering.
- Strong understanding of SRE principles (SLIs, SLOs, error budgets, incident management).
- Hands-on expertise in:
- Splunk (log ingestion, SPL, dashboards, alerting).
- Dynatrace (APM, RUM, synthetic monitoring).
- Strong experience with Microsoft Azure.
- Experience supporting large-scale, customer-facing platforms.
- Proficiency in Python, PowerShell, or Bash.
- Strong analytical and problem-solving skills.
Preferred Qualifications.
- Experience in retail/e-commerce environments.
- Knowledge of microservices and distributed systems.
- Experience with AKS, Docker, and containerized environments.
- Familiarity with tools like Prometheus, Grafana, ELK.
- Certifications in Splunk, Dynatrace, or Azure.
Key Skills.
- SRE Principles & Reliability Engineering.
- Observability Strategy & Tooling.
- Splunk & Dynatrace Expertise.
- Azure Monitoring & Integration.
- Requirement Analysis & Solution Design.
- Automation & Enablement.
- Stakeholder Communication.
Skills: sre lead,,sre reliability engineering,splunk,dynatrace