
Search by job, company or skills
About the Role:
We are looking for a Staff Engineer to join our Cloud Platform team and take deep ownership of key platform systems within our multi-tenant SaaS infrastructure. You will work at the intersection of backend engineering and cloud infrastructure — designing, building, and operating the systems that power our control plane and data plane at scale.
This is a high-impact, hands-on role. You will be a technical leader within the team — driving design decisions for significant platform initiatives, growing engineers around you, and partnering with senior engineers and product leads to ship reliable, scalable platform capabilities.
What You Will Do:
Architecture & Design
• Lead architecture and design for significant platform subsystems spanning control plane and data plane
• Design and implement multi-tenant components with strong isolation, security, and resource governance
• Define platform abstractions across cloud environments — AWS, GCP, Azure, and on-premises
• Participate actively in architectural reviews and RFCs; contribute well-reasoned, documented design proposals.
Control Plane
• Build and evolve systems for tenant provisioning, lifecycle management, and configuration
• Implement cluster orchestration and management workflows that operate reliably at scale
• Develop APIs and automation enabling self-service for operators and tenants
• Ensure your areas of ownership are highly available, auditable, and observable
Data Plane
• Implement data path components for high-throughput, low-latency workloads
• Build and enforce isolation boundaries between tenants at the data layer
• Optimize for performance, reliability, and cost efficiency within your feature areas
Engineering Excellence
• Uphold and contribute to technical standards and coding practices for the platform team
• Identify and address reliability, scalability, and security risks within your scope
• Partner with SRE and DevOps on observability, incident response, and capacity planning
Mentorship & Collaboration
• Mentor and grow mid-level engineers through code review, design feedback, and pairing
• Contribute to hiring as an interviewer; help calibrate the bar for platform engineers
• Work closely with cross-geo engineering teams to align on platform direction and delivery
Must Have
SaaS Platform & Multi-Tenancy
• Hands-on experience building or operating multi-tenant SaaS platforms — silo/pool/bridge models, noisy neighbor mitigation, tenant resource quotas, and lifecycle automation (onboarding, provisioning, offboarding)
• Familiarity with data isolation strategies — schema-per-tenant, database-per-tenant, row-level security, and per-tenant encryption at rest
Control Plane & Data Plane
• Experience with control plane / data plane separation and cluster management — Kubernetes
operators, CRDs, admission webhooks, RBAC, and namespace isolation
• Working knowledge of configuration management at scale — GitOps workflows, feature flags, and dynamic config propagation across distributed environments
Cloud & Infrastructure
• Solid hands-on experience with at least one of AWS, GCP, or Azure — VPC, IAM, managed Kubernetes
(EKS/GKE/AKS), IaC (Terraform/Pulumi), and cloud cost awareness
• Familiarity with service mesh technologies — Istio, Linkerd, or Envoy — for traffic management, mTLS, and microservices observability
Distributed Systems & Security
• Good understanding of distributed systems fundamentals — HA patterns, fault tolerance, observability
(OpenTelemetry, Prometheus, Grafana), and resilience testing
• Awareness of zero-trust security, secrets management (Vault, AWS Secrets Manager), and compliance
requirements (SOC 2, ISO 27001) at the infrastructure level
AI-Augmented Engineering
• Actively uses AI coding assistants — Claude, Cursor, Copilot — for infrastructure tasks, runbook
generation, and incident analysis
• Can craft effective prompts for engineering problems and critically evaluate AI-generated output
Good to Have
• Exposure to FinOps practices — cost attribution per tenant, showback/chargeback models, and cloud cost anomaly detection
• Experience building or contributing to observability platforms — centralized logging pipelines, metrics dashboards (Grafana, Datadog), distributed tracing, and alerting systems
• Contributions to open-source infrastructure or platform projects
What Success Looks Like
In 3 months:
• Solid understanding of the current platform architecture, key systems, and near-term roadmap
• Shipped at least one meaningful improvement to control or data plane with minimal guidance
• Building good working relationships with team members across engineering and product
In 6 months:
• Owning and delivering a significant platform feature or subsystem end-to-end
• Proactively identifying and raising reliability or scalability risks within your area
• Actively conducting interviews and contributing to team hiring
In 12 months:
• Recognized as a strong technical voice on the platform team for your area of ownership
• Delivered measurable improvements to platform reliability, scale, or developer experience
• Mid-level engineers you've worked with showing clear growth in design and execution quality
Job ID: 145806039