
Search by job, company or skills
Job Description:
As an L3, you are the technical anchor for complex production issues across our DSP data and bidding surfaces. You'll lead SEV1/2 incident
bridges, perform deep SQL and system investigations, implement durable fixes, and partner with Platform/Data/Backend Engineering on
performance and cost optimization. You'll mentor L2s, harden runbooks, and raise the reliability bar across Snowflake, Postgres, MySQL,
and Athena.
Key Responsibilities:
Major incident leadership: Drive SEV1/2 resolution; coordinate war rooms; communicate impact and mitigation clearly to U.S.
business stakeholders and leadership.
Advanced diagnostics & performance tuning:
Snowflake: micropartitioning, clustering, warehouse sizing/concurrency, query plan analysis.
Athena/Presto: partitioning, file formats (Parquet/ORC), partition pruning, S3 layout.
Postgres/MySQL: indexing strategies, query plans, connection pooling, vacuum/ANALYZE, parameter tuning.
Sustainable engineering fixes: Ship PRs or scripts for hotfixes, add guardrails/data quality checks, automate remediations, and
eliminate toil for L2s.
DSP domain problemsolving: Debug bidder behavior (latency, QPS, timeout budgets), ORTB field mismatches, pricefloor issues,
segment join/refresh rates, attribution/reporting gaps.
Observability & SLOs: Define and refine SLOs, create highsignal alerts and dashboards, and drive postmortem RCAs with action
items to closure.
Mentorship & knowledge: Levelup L2 team via training, improved runbooks, and tooling; set escalation criteria and acceptance
checklists.
Cost & performance stewardship: Optimize compute/storage spend (warehouse sizing, query rewrites, data retention strategies)
while maintaining SLAs.
What We Are Looking For:
58+ years in production engineering/support for largescale data or adtech platforms, including leading SEV1/2 incidents.
Expertlevel SQL and proven tuning experience across Snowflake, Postgres, MySQL, Athena .
Proficiency with Python/Bash , Linux , and at least one observability stack ( Datadog/Grafana/Kibana ).
Deep working knowledge of DSP/bidding ecosystems: ORTB, bidder latency envelopes, winrate dynamics, pacing algorithms,
audience pipelines, creative approvals/brand safety .
Strong stakeholder communication with U.S. teams; comfortable writing execlevel incident summaries and presenting RCAs.
Sustainable mindset: prioritize rootcause elimination, automation, and change management that reduces risk and ticket load
over time.
Nice to have
Experience with Kafka/Kinesis , S3/Glue/Lambda , Looker/Mode/Tableau .
Familiarity with privacy/compliance (HIPAA/SOC2, consent frameworks) and loglevel data sharing practices.
Success metrics
Reduced SEV1/2 frequency and duration , % recurring issues eliminated, performance/cost improvements delivered, quality of
RCAs and engineering fixes, L2 readiness uplift.
Work hours & oncall
Core coverage 9:00am6:00pm ET , with flexibility for incident bridges.
Primary oncall rotation for critical incidents (with backup L2 rotation).
Job ID: 141490435