Senior Staff Software Engineer - SRE, P5
Strength in Trust
OneTrust's mission is to enable innovation through the responsible use of data and AI. We believe that ensuring data is trusted shouldn't slow teams down—it should accelerate what's possible. This led us to develop the first technology platform for responsible data use in 2016. Today, with AI representing the latest and most impactful expansion of data yet, OneTrust is once again redefining what responsible innovation looks like. OneTrust, the AI‑Ready Governance Platform™, unifies regulatory intelligence, automation, and connected governance workflows so businesses can continue to move at the speed of AI while ensuring good governance to prevent data misuse at scale. Trusted by thousands of organizations worldwide, OneTrust is shaping the future where trusted data becomes a transformative force for business and society.
Role Overview
We are seeking a Senior Staff Software Engineer (SRE) who will be instrumental in ensuring the reliability, scalability, and performance of our platform. This role combines deep software engineering expertise with operational excellence to design resilient systems, drive reliability standards, and lead complex cross-functional initiatives.
Primary Background:
- Demonstrates exceptional technical depth and system-level thinking across distributed architectures.
- Possesses end-to-end ownership of large platform components, driving both technical direction and execution.
- Can dive into any layer of the stack, troubleshoot complex production issues, and lead high-severity incident resolution.
- Defines and influences architecture, scalability, and reliability strategies aligned to business goals.
- Translates long-term strategic goals into multi-release roadmaps with measurable outcomes.
- Leads and mentors engineers, elevating the team's technical and operational maturity.
- Drives alignment across teams by incorporating multiple perspectives and making timely, well-informed decisions.
- Champions customer-first thinking, proactively identifying and solving reliability and performance challenges.
- Tackles ambiguous, open-ended problems and converts them into structured, actionable solutions.
Job Requirements:
- Bachelor's or Master's degree in Computer Science, Engineering, or related technical field
- 10+ years of experience in software engineering with a strong focus on backend systems and distributed architecture
- Extensive experience building and operating Java-based systems using: RESTful APIs, Spring Boot, Microservices architecture.
- Strong understanding of distributed systems concepts, including fault tolerance, eventual consistency, and scalability
- Proven experience with cloud platforms (AWS/Azure/GCP) and cloud-native architectures
- Expertise in observability tools (monitoring, logging, tracing) such as Prometheus, Grafana, ELK, or similar
- Experience defining and managing SLIs, SLOs, and error budgets
- Strong knowledge of CI/CD pipelines, automation, and infrastructure as code
- Hands-on experience with incident management, root cause analysis (RCA), and postmortems
- Excellent analytical, debugging, and problem-solving skills
- Strong communication, collaboration, and leadership abilities
Responsibilities:
- Design and build platforms, tools, and frameworks to improve system reliability, scalability, and performance.
- Define and implement SRE best practices, including SLIs/SLOs, error budgets, and reliability metrics.
- Lead incident response efforts, drive root cause analysis, and implement long-term fixes to prevent recurrence
- Analyze system behavior, identify bottlenecks and saturation points, and implement solutions to improve resilience
- Partner with engineering teams to embed reliability into the software development lifecycle
- Evaluate emerging technologies and recommend tools that enhance productivity, observability, and system robustness
- Drive capacity planning, performance tuning, and cost optimization efforts
- Collaborate with cross-functional teams to identify gaps, prioritize improvements, and resolve production issues
- Provide technical leadership and mentorship across the engineering organization
- Influence senior leadership with insights, metrics, and recommendations to improve system health and operational excellence