Responsibilities
- Assist in the design, implementation, and maintenance of observability solutions for Azure-based applications.
- Monitor system health, performance, and availability using Azure Monitor, Application Insights, and Log Analytics.
- Implementation experience on azure alerts and azure log analytics workspace.
- Candidate should be able to learn other technologies like solarwinds, elastic, AWS cloudwatch..etc
- Support SRE practices by automating infrastructure tasks, incident response, and root cause analysis.
- Develop and maintain dashboards, alerts, and reports to provide insights into system performance.
- Troubleshoot and resolve issues related to Azure infrastructure and application performance.
- Collaborate with DevOps, development, and operations teams to improve system reliability and efficiency.
- Implement and manage logging, tracing, and metrics collection for microservices-based architectures.
- Assist in developing runbooks, playbooks, and documentation for incident management and resolution.
Participate in on-call rotations and proactively address potential system failures
Qualifications
- Bachelor's degree in Computer Science, IT, or a related field (or equivalent experience).
- Should have minimum 7+ years of experience.
- Basic knowledge of Azure services, including Virtual Machines, Kubernetes (AKS), Storage, and Networking.
- Familiarity with observability tools like Azure Monitor, Application Insights, Grafana, or Prometheus.
- Understanding of logging and tracing concepts using tools like Log Analytics, Elastic Stack, or OpenTelemetry.
- Exposure to scripting and automation using PowerShell, Python, or Terraform.
- Knowledge of CI/CD pipelines and Infrastructure as Code (IaC) principles.
- Strong problem-solving skills and ability to work in a fast-paced environment.
- Good communication and collaboration skills.
Preferred Qualifications
- Hands-on experience with Azure DevOps or GitHub Actions.
- Basic understanding of SRE principles, error budgets, and SLIs/SLOs.
- Exposure to containerization technologies like Docker and Kubernetes.
- Experience with ITIL practices and incident management processes.
- Certification in Azure Fundamentals (AZ-900) or Azure Administrator (AZ-104) is a plus.