Meet The Team
Dexcom is looking for an experienced, software-centric
SeniorSite Reliability Engineer to join our R&D Platform team. In this role, you will be a key driver in building and evolving the resilient cloud infrastructure that supports life-changing medical technologies. You will bridge the gap between traditional SRE and
AI-Native Engineering, scaling distributed systems and implementing agentic workflows that ensure our platforms remain secure, highly available, and 10X-ready.
As a mid-level member of the team, you will focus on
systemic reliability through code. You will tackle architectural challenges related to low-latency data streaming and high-concurrency environments. This is an opportunity for a seasoned engineer to move the needle on
Agentic SDLC, building self-healing systems that replace manual operational intervention with intelligent, software-driven solutions.
Where You Come In
- Agentic Architecture & OPAL: Take ownership of portions of the OPAL (Operations Performed by Agentic Layers) initiative. Design and deploy standardized AI agents and MCP (Model Context Protocol) servers to automate complex SDLC and operational tasks.
- Observability Engineering: Design and refine the observability stack to provide deep insights into distributed tracing and system performance, using data-driven analysis to predict and prevent outages.
- Cloud & Infrastructure Ownership: Architect and provision software-defined, scalable infrastructure on GCP. You will lead infrastructure projects from design to deployment with minimal supervision.
- Orchestration Mastery: Optimize Kubernetes scheduler behavior and resource utilization patterns. Implement advanced traffic management and service mesh configurations to improve microservices orchestration.
- Advanced Incident Management: Lead root-cause analysis for complex distributed systems disruptions. Develop long-term programmatic fixes and automated recovery patterns to eliminate entire classes of failure.
- Internal Tooling Development: Build internal software services and agentic layers that treat infrastructure as a software product, abstracting away complexity for our development teams.
- Mentorship & Review: Actively lead design reviews and facilitate blameless post-mortems. Mentor junior engineers in reliability-first design and modern systems programming practices.
What Makes You Successful
- Systems Engineering & Logic: Advanced understanding of data structures, algorithms, and software design patterns. Proven proficiency in a systems language (Go strongly preferred, or Python) with experience writing concurrent, high-performance code.
- AI-Native Mindset: Demonstrated experience or deep interest in Agentic SDLC, including the programmatic integration of LLMs (e.g., Gemini) into engineering workflows.
- Systems Internals: First-principles understanding of Linux internals (cgroups, namespaces, I/O) and advanced networking (BGP, Load Balancing, HTTP/3, gRPC).
- Methodical Architecture: You view infrastructure through the lens of software engineering, prioritizing modularity, testability, and self-healing capabilities.
- Analytical Leadership: Ability to articulate complex technical challenges to stakeholders and drive consensus on architectural decisions.
What You'll Get
- A front row seat to life changing CGM technology. Learn about our brave #dexcomwarriors community.
- A full and comprehensive benefits program.
- Growth opportunities on a global scale.
- Access to career development through in-house learning programs and/or qualified tuition reimbursement.
- An exciting and innovative, industry-leading organization committed to our employees, customers, and the communities we serve.
Travel Required
Experience And Education
- Education: Bachelor's degree in Computer Science or a related engineering field.
- Experience: 5-8 years of professional experience in SRE, Distributed Systems, or Software Engineering.
- Proven Track Record: Experience managing production workloads in Kubernetes and Terraform/Pulumi at scale.
- AI/Agentic Skills: Hands-on experience integrating AI agents, building connectors, or automating workflows via LLM APIs is a significant advantage.
- Certifications: CKA (Certified Kubernetes Administrator) or Google Cloud Professional Cloud Architect is highly preferred.