Search by job, company or skills

recrew ai

Site Reliability Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 13 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role: Site Reliability Engineer

Function: Engineering / Site Reliability

Location: Bangalore

Type: Full-time

Industry: Marketing Research / Data & Analytics

About Company

A global leader in data-driven marketing research with 25+ years of experience. The company serves 4,000+ brands across Asia-Pacific using a network of 130 million+ consumer panelists.

It combines survey data, digital behavior, and purchase insights to deliver actionable research. Currently in a growth-through-acquisition phase, backed by approximately $848M in funding.

Engineering teams build cloud-native platforms that power research at massive scale.

Position Overview

As a Site Reliability Engineer, you will own the end-to-end platform and infrastructure that powers the company's research systems at scale. You'll work closely with engineering teams to build reliable, secure, and cost-efficient systems on GCP — driving automation, observability, and developer productivity across the organisation.

Role & Responsibilities

• Own and manage end-to-end cloud infrastructure on GCP including Compute Engine, GKE, Cloud SQL, Pub/Sub, and Cloud Storage

• Design, build, and maintain CI/CD pipelines using GitHub Actions to enable faster and safer deployments

• Implement and manage Infrastructure as Code using Terraform for all infrastructure provisioning and automation

• Build and enhance observability stack (Datadog, OpenTelemetry) covering logging, metrics, and distributed tracing

• Lead incident management, root cause analysis, and post-mortem processes for production systems

• Define and maintain SLIs, SLOs, and error budgets to drive reliability decisions across services

• Automate operational processes, reduce toil, and support service onboarding to modern platform architecture

Must Have Criteria

• 4+ years of experience building and operating production systems at scale

• Hands-on experience with GCP services (Compute Engine, GKE, Cloud SQL, Cloud Storage, Pub/Sub)

• Proficiency in Terraform for infrastructure provisioning and management in production environments

• Experience running containerised workloads with Docker and Kubernetes (GKE) in production

• Experience building and maintaining CI/CD pipelines (GitHub Actions or equivalent)

• Hands-on experience with observability tools — specifically Datadog and/or OpenTelemetry (metrics, logs, traces)

• Programming experience in Go and scripting experience in Bash for automation and tooling

Nice to Have

• Hands-on experience with SRE practices: SLO-driven operations, error budgets, and reliability reviews

• Experience building internal developer platforms or platform engineering initiatives

• Business-level Japanese proficiency (JLPT N3 or equivalent) for collaboration with Japan-based teams

• Experience applying AI/ML tools to enhance SRE automation or incident response

• Open-source contributions or experience mentoring engineers on SRE/DevOps practices

What We Offer

• Opportunity to own and shape the entire platform infrastructure for a globally scaled research platform

• Work with a modern, cloud-native stack (GCP, Terraform, Datadog, Go) in an agile engineering culture

• Exposure to large-scale consumer data systems serving 4,000+ enterprise clients across Asia-Pacific

• Collaborative, transparent work culture with strong ownership and continuous learning

• Competitive compensation with growth opportunities in a company backed by $848M in funding

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 146407305

Similar Jobs