- Technology->DevOps->Site Reliability Engineering (SRE)
What you will be doing As part of the Application Observability (AppO) team, your responsibilities will include: 1. Defining and refining monitoring and alerting rules, both for the team and organisation wide 2. Work together with other teams (Platform and Observability Backend) to enhance performance and fulfil user stories 3. Leading projects such as Grafana's migration from on-premises data centers to AWS by planning, defining requirements, supervising and implementing 4. Improving the deployment of services using Git workflows and ArgoCD 5. Proposing and validating performance and user experience improvements for AppO services 6. Addressing issues, implementing preventive measures and managing postmortems and related improvement tasks 7. Analysing performance, identifying anomalies and defining, documenting and implementing corrective measures Ensuring compliance with the SLA 8. Additionally, you will participate in the on-call rotation for team services, which requires the ability to resolve issues (using runbooks) knowledge on skill like (Elasticsearch, ThanosKafka, OpenTelemetry, Grafana and Docker) Three KEY domain exposure: 1. DevOps 2. Platform Engineering 3. Application Observability
- Good knowledge on software configuration management systems
- Strong business acumen, strategy and cross-industry thought leadership
- Awareness of latest technologies and Industry trends
- Logical thinking and problem-solving skills along with an ability to collaborate
- Two or three industry domain knowledge
- Understanding of the financial processes for various types of projects and the various pricing models available
- Client Interfacing skills
- Knowledge of SDLC and agile methodologies
- Project and Team management