Search by job, company or skills

  • Posted 19 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Key Responsibilities

  • SRE Fundamentals & Reliability Engineering
  • Apply core SRE principles:
    • SLIs, SLOs, and SLAs definition and governance.
    • Error budgets and reliability trade-offs.
    • Incident management and postmortems.
  • Collaborate with SRE L2/L3 teams to improve system reliability and performance.
  • Drive reduction in MTTR and enhance proactive detection strategies.
  • Observability Strategy & Tool Recommendation (Core Responsibility)
  • Act as the central expert for Splunk and Dynatrace capabilities.
  • Analyze requirements from:
    • Application developers.
    • SRE L2/L3 engineers.
  • Evaluate and determine:
    • Whether to use Splunk, Dynatrace, or both.
    • The most efficient, scalable, and cost-effective approach.
  • Translate business and technical requirements into tool-specific solutions.
  • Recommend best practices, architecture, and design patterns.
  • Continuously evaluate new features and enhancements.
  • Splunk Engineering
  • Design and optimize logging and monitoring solutions.
  • Develop advanced SPL queries, dashboards, and alerts.
  • Define log onboarding strategies and data models.
  • Ensure data quality, governance, and cost efficiency.
  • Provide guidance on effective Splunk usage.
  • Dynatrace Expertise
  • Configure and optimize Dynatrace (APM, RUM, synthetic monitoring).
  • Leverage AI-driven anomaly detection and root cause analysis.
  • Map business transactions and critical user journeys.
  • Guide teams on best practices and tool utilization.
  • Azure Observability
  • Implement and integrate monitoring solutions in Microsoft Azure.
  • Work with:
    • Azure App Services.
    • AKS.
    • Azure Functions.
    • Azure Monitor, Log Analytics, Application Insights.
  • Ensure seamless integration across Azure, Splunk, and Dynatrace.
  • Automation & Enablement
  • Develop automation scripts (Python, PowerShell, Bash).
  • Enable self-service observability for engineering teams.
  • Integrate tools with ServiceNow, Jira, or similar platforms.
  • Provide documentation, standards, and reusable templates.
  • Collaboration & Advisory
  • Act as a trusted advisor to developers and SRE teams.
  • Conduct requirement intake and solution design sessions.
  • Provide training on observability best practices.
  • Drive adoption of standardized monitoring approaches.
Required Qualifications

  • 5+ years in SRE, DevOps, or Observability Engineering.
  • Strong understanding of SRE principles (SLIs, SLOs, error budgets, incident management).
  • Hands-on expertise in:
    • Splunk (log ingestion, SPL, dashboards, alerting).
    • Dynatrace (APM, RUM, synthetic monitoring).
  • Strong experience with Microsoft Azure.
  • Experience supporting large-scale, customer-facing platforms.
  • Proficiency in Python, PowerShell, or Bash.
  • Strong analytical and problem-solving skills.
Preferred Qualifications

  • Experience in retail/e-commerce environments.
  • Knowledge of microservices and distributed systems.
  • Experience with AKS, Docker, and containerized environments.
  • Familiarity with tools like Prometheus, Grafana, ELK.
  • Certifications in Splunk, Dynatrace, or Azure.

Key Skills

  • SRE Principles & Reliability Engineering.
  • Observability Strategy & Tooling.
  • Splunk & Dynatrace Expertise.
  • Azure Monitoring & Integration.
  • Requirement Analysis & Solution Design.
  • Automation & Enablement.
  • Stakeholder Communication.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 146393775

Similar Jobs