Years of Experience: 10+
Technologies: Adobe Experience Suite, React, Java, Commerce preferably SAP Hybris, Application monitoring tools, and Github
Process: ITIL, Scrum and Support Mindset
Responsibilities:
- Manage and lead our Web/eCommerce Site Reliability Engineering team as a part of Digital Transformation with primary goal of making the production platform stable and reliable.
- Serve as point of contact for Site Reliability Engineering and production support of Web/eCommerce platform, which includes Adobe Experience Management suite, SAP Commerce Cloud, Solr Search, MuleSoft based services API, etc.
- Responsible for managing production incidents and own the closure
- Responsible for delivering clear, concise, timely communication to our customers to ensure their confidence in our team's passion to provide them with the best customer experience possible.
- Manage on-call rotations across continents, using a follow-the-sun model.
- Lead SRE team and continuously assess & implement best industry SRE practices
- Own incident management, problem management, and service request management
- Accountable for production platform and it's uptime, availability, stability, and capacity planning
- Monitor baselines of technical KPIs such as uptime, performance, and error rate of web/eCommerce platform and drive the efforts to improve these with the help of other teams, as needed
- Drive team to enhance monitoring and alerting for all technical components by creating dashboards, visualizations, baselines, and alerts
- Provide 24X7 on-call support during on-call rotation and be available during non-working hours when needed for critical incidents or during production release