Search by job, company or skills

TriNet

Principal Site Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 21 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description

TriNet is a leading provider of comprehensive human resources solutions for small to midsize businesses (SMBs). We enhance business productivity by enabling our clients to outsource their HR function to one strategic partner and allowing them to focus on operating and growing their core businesses. Our full-service HR solutions include features such as payroll processing, human capital consulting, employment law compliance and employee benefits, including health insurance, retirement plans and workers compensation insurance.

TriNet has a nationwide presence and an experienced executive team. Our stock is publicly traded on the NYSE under the ticker symbol TNET. If you're passionate about innovation and making an impact on the large SMB market, come join us as we power our clients business success with extraordinary HR.

Don't meet every single requirement Studies have shown that many potential applicants discourage themselves from applying to jobs unless they meet every single requirement. TriNet always strives to hire the most qualified candidate for a particular role, ensuring we deliver outstanding results for our small and medium-size customers. So if you're excited about this role but your past experience doesn't align perfectly with every single qualification in the job description, nobody's perfect and we encourage you to apply. You may just be the right candidate for this or other roles.

Job Summary

Site Reliability Engineers at TriNet are not just limited to understand how technology works but are also responsible to help advocate our in-house development teams to develop and deliver flawless products consistently. This position will be responsible for supporting TriNet's mission critical platforms by identifying and driving improvements in infrastructure & system reliability, performance, high availability, observability, and overall stability of the platform by leveraging the key SRE foundational principles such as operations as code, removing toil, as well as fail fast through proactive monitoring.

Job Description

The SRE will work with engineering developments, Analytics Organization, Architects, IT organizations to implement best practices for reliability and performance with the applications and services they support. Our ideal candidate is well-versed in modern cloud-based and on prem architecture and experienced in designing systems for reliability as well as implementing monitoring, alerting, and ops automation to reliably operate and maintain the services they build.

The SRE works with developers to improve the Reliability and Resiliency of TriNet Software Solutions to meet the business requirements by implementing SRE tools, processes, and best practices. SRE is what happens when you ask a software engineer to design an operations function.

The SRE works with development teams to advise on how to design, develop, and test for reliability, and to automate tasks for applications. The SRE also helps troubleshoot incidents to address failure patterns, automate remediation through runbooks, and document application optimization.

Essential Duties

  • Collaborate with Engineering teams to support services before they go live through activities such as system design consulting, developing secure, reliable and highly available software platforms and frameworks, monitoring/alerting, capacity planning, production readiness and reliability reviews. 20%
  • Guides reliability practices through activities including architecture reviews, code reviews, capacity/scaling planning, security vulnerability remediations. 15%
  • Conducts, coordinates, and oversees post-incident Root Cause Analysis / Reviews and drive product improvements. 15%
  • Participate with other SRE leaders in setting the enterprise strategy for designing and developing resiliency in the application code. 15%
  • Participates in on-call rotation for the services owned by the SRE team, effectively triaging, and resolving production and development issues. 5%
  • Should be able to perform code level debugging on issues escalated to the team. 10%
  • Mentor junior and senior engineers and developers to help them grow and refine their SRE skills. 10%
  • Sets reliability and automation standards, mentors and uplifts the team, leads incidents calmly, collaborates empathetically across engineering, drives SLO/error-budget culture, communicates transparently, and models strong ownership. 10%

Education


  • Bachelor's degree in computer science, Engineering, or related field.

Experience


  • 12+ years of relevant experience in SRE/DevOps or similar roles.
  • At least 7 years of experience in public cloud (AWS, Azure etc), and container technologies.

Key Skills/ Abilities


  • Demonstrate strong experience with programing languages like Java, Python.
  • Strong experience on High availability planning, Capacity planning, and Disaster Recovery is required.
  • Technical proficiency: Strong hands-on experience with Ansible or Terraform and building services in AWS, and strong understanding of in-memory data stores such as Redis, Memcached.
  • Experience working with IaC tools like Terraform , Ansible and managing Kubernetes services, including HELM
  • In-depth knowledge of REST APIs, OAuth, OpenID Connect (OIDC), and SAML, with proven experience in implementing secure authentication and authorization mechanisms.
  • Knowledge of various network protocols like IPv4/6 TCP/IP, FTP, SMTP, UDP, SSL and HTTP/HTTPS.
  • Practical understanding of messaging technologies such as ActiveMQ, RabbitMQ etc.
  • Ability to leverage monitoring / logging analytics tools such as Prometheus, Grafana, Splunk and AppDynamics.
  • Ability to architect applications & solutions that are Highly Available, Scalable and Highly fault tolerant.
  • Ability to be cool-headed while troubleshooting Production issues on Incident bridges, ability to focus on problem resolution. Hands on experience with container technologies such as Docker, Kubernetes.
  • Deep understanding of the concepts like microservice architecture, Middleware technologies, Networking, databases, and Observability.
  • Experience In managing large scale distributed web applications In an SRE role/capacity.
  • Should have a security first mind set while designing / architecting solutions.
  • Deep understanding of Linux/Unix operating systems, file systems, administration, and networking.
  • Ability to develop and maintain automation scripts using Ansible, Terraform, Python, and Java.
  • Hands on experience with public cloud technologies (AWS, Azure, and OCI Preferred).
  • Proficient in using configuration management tools like Ansible, Puppet, and Terraform.
  • Extensive experience in deploying and maintaining applications to Kubernetes using Docker, Jenkins, and Git.
  • Experience leveraging various monitoring tools such as Prometheus/Grafana, AppDynamics and CloudWatch to monitor and improve application availability and performance.
  • Ability to create documentation, runbooks, and train Tier I/Tier II teams to support day-to-day operations.
  • Ability to adapt to a fast paced, constantly evolving business and work environment while managing multiple priorities with little supervision.
  • Exceptional debugging and analytical skills.
  • Ability to communicate well and thrive under pressure while collaborating and managing competing demands with tight deadlines.

License and Certification


  • Cloud Architect Certifications (AWS preferred)
  • Kubernetes Certifications (CKA preferred)

Work Environment


  • Work in a clean, pleasant, and comfortable office work setting. The work environment characteristics described here are representative of those an employee encounters while performing the essential functions of this job. Reasonable accommodations may be made to enable persons with disabilities to perform the essential functions.
  • This position is 100% in office.

Please Note: TriNet reserves the right to change or modify job duties and assignments at any time. The above job description is not all encompassing. Position functions and qualifications may vary depending on business necessity.

TriNet is an Equal Opportunity Employer and does not discriminate against applicants based on race, religion, color, disability, medical condition, legally protected genetic information, national origin, gender, sexual orientation, marital status, gender identity or expression, sex (including pregnancy, childbirth or related medical conditions), age, veteran status or other legally protected characteristics. Any applicant with a mental or physical disability who requires an accommodation during the application process should contact [Confidential Information] to request such an accommodation.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 138350817

Similar Jobs