Job description
Do you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure If yes, come join our team and develop your career.
The Senior Site Reliability Engineer will analyze chronic and major issues, evaluate products and their services, make recommendations to improve service outcomes, design solutions in partnership with product, engineering, and architecture teams, build, test, operationalize tools and applications to improve customer experience and reduce costs. Additionally, the Senior Site Reliability Engineer will provide oversight and coaching to engineers and be an escalation for our global command center engineers.
About the Role:
In this opportunity as Senior Site Reliability Engineer, you will be responsible for:
- Operational Excellence: Drive the implementation of best practices for reliability, scalability, and performance across our systems and services. Establish and monitor key metrics to ensure uptime, availability, and response times meet or exceed SLAs. Leading the work to drive efficiencies and reduce service operations risks. Lead the research of new capabilities, testing new solutions, recommending and implementing new technologies to improve customer experience and reduce costs.
- System Architecture: Collaborate with cross-functional teams to design, build, and maintain scalable and resilient architectures for our cloud-based infrastructure and applications. Identify opportunities for optimization and efficiency improvements. Solve intractable problems and devising solutions to improve the products and services we offer our customers.
- DevOps Practices: Promote and implement DevOps principles and practices to streamline software delivery, automate infrastructure provisioning, and improve deployment processes. Collaborate with development teams to integrate SRE practices into the software development lifecycle.
- Automation and Tooling: Champion the use of automation and tooling to streamline operational workflows, increase efficiency, and reduce manual toil. Drive the development of monitoring, alerting, and automation solutions to proactively identify and remediate issues.
- Continuous Improvement: Promote a culture of continuous improvement by fostering innovation, experimentation, and learning within the team. Encourage knowledge sharing and professional development to enhance technical skills and expertise.
About You:
Youre a fit for the role of Senior Site Reliability Engineer if:
- Minimum 7+ years of experience with cloud technologies, services, use of their APIs, and configuration tools. (e.g., AWS, Azure, GCP).
- Strong problem-solving and analytical skills, with a proactive approach to identifying and resolving complex technical issues.
- You are proficient in DevOps practices and methodologies, with hands-on experience in CI/CD pipelines, configuration management, and infrastructure as code.
- You use AI/ML tools to help improve service, reduce costs, and worked with AI-Operations solutions.
- You are familiar with programming languages such as Python, Java, C#.
- You have designed and supported scalable systems and services.
- You are able to demonstrate ownership of accountabilities.
- You are proficient with Networking, Widows, Linux, Container, PostgreSQL, or related infrastructure services at scale.
- You can automate tasks to improve service operations and support.
- You use configuration management tools to manage configuration at scale.
- You apply the scientific method to system components to identify improvements.
- You are proficient in Observability tools such as Data Dog or New Relic.
- You are proficient in data analysis from sources such as SQL, S3, Athena, etc.
Whats in it For You
Join us to inform the way forward with the latest AI solutions and address real-world challenges in legal, tax, compliance, and news. Backed by our commitment to continuous learning and market-leading benefits, youll be prepared to grow, lead, and thrive in an AI-enabled future. This includes:
- Industry-Leading Benefits: We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing.
- Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, and hybrid model, empowering employees to achieve a better work-life balance.
- Career Development and Growth:By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrows challenges and deliver real-world solutions. Our skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future.
- Culture: Globally recognized and award-winning reputation for inclusion, innovation, and customer-focus. Our eleven business resource groups nurture our culture of belonging across the diverse backgrounds and experiences represented across our global footprint.
- Hybrid Work Model: Weve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected.
- Social Impact: Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives.