Search by job, company or skills

Arokee Online Solutions

Lead Site Reliability Engineer Location:

Save
new job description bg glownew job description bg glow
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the role We're hiring a Lead SRE to set the reliability bar for a multi-sided

marketplace serving global users (primarily US). You'll define how we run production

across multiple clouds and four environments, lead incident response, and build the

practices the rest of the SRE team operates by.

What you'll own

Reliability strategy: SLOs, error budgets, and uptime targets across all critical

services

Incident command for Sev-1s — triage, communication, resolution, and RCAs

that actually prevent repeats

Multi-cloud architecture decisions (Azure primary, GCP secondary) and the

operational contract between hosted frontend/backend platforms and our own

container workloads

Release engineering across web and mobile (Android + iOS) — staged rollouts,

OTA updates, rollback discipline, per-environment previews across dev, QA,

staging, prod

Monitoring and alerting strategy end-to-end: dashboards, signal-to-noise, paging

policy, and on-call health

Third-party reliability posture — payments (Stripe), comms (Twilio, LiveKit), and

AI/LLM relays — with monitoring and graceful-degradation playbooks

DR strategy, backup validation, and compliance-relevant operations for a global

marketplace

Hiring, mentoring, and on-call rotation design for the SRE team

Must have

7–10 years in production SRE roles, with time spent leading incident response at

scale

Deep experience operating SaaS on at least one major cloud (Azure preferred);

working knowledge of a second

Track record running services on hosted platforms (e.g., Vercel-class) alongside

owned container infrastructure

Mobile release pipelines for Android + iOS across multiple environments

Strong opinions on observability, alerting hygiene, and postmortem culture

Calm incident command; clear writing; comfort with cross-time-zone operations

Nice to have Marketplace or two-sided-platform experience, real-time communications

operations, LLM/AI service reliability, SOC 2 or CCPA exposure.

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 147491001