
Search by job, company or skills
Role: Site Reliability Engineer
Function: Engineering / Site Reliability
Location: Bangalore
Type: Full-time
Industry: Marketing Research / Data & Analytics
About Company
A global leader in data-driven marketing research with 25+ years of experience. The company serves 4,000+ brands across Asia-Pacific using a network of 130 million+ consumer panelists.
It combines survey data, digital behavior, and purchase insights to deliver actionable research. Currently in a growth-through-acquisition phase, backed by approximately $848M in funding.
Engineering teams build cloud-native platforms that power research at massive scale.
Position Overview
As a Site Reliability Engineer, you will own the end-to-end platform and infrastructure that powers the company's research systems at scale. You'll work closely with engineering teams to build reliable, secure, and cost-efficient systems on GCP — driving automation, observability, and developer productivity across the organisation.
Role & Responsibilities
• Own and manage end-to-end cloud infrastructure on GCP including Compute Engine, GKE, Cloud SQL, Pub/Sub, and Cloud Storage
• Design, build, and maintain CI/CD pipelines using GitHub Actions to enable faster and safer deployments
• Implement and manage Infrastructure as Code using Terraform for all infrastructure provisioning and automation
• Build and enhance observability stack (Datadog, OpenTelemetry) covering logging, metrics, and distributed tracing
• Lead incident management, root cause analysis, and post-mortem processes for production systems
• Define and maintain SLIs, SLOs, and error budgets to drive reliability decisions across services
• Automate operational processes, reduce toil, and support service onboarding to modern platform architecture
Must Have Criteria
• 4+ years of experience building and operating production systems at scale
• Hands-on experience with GCP services (Compute Engine, GKE, Cloud SQL, Cloud Storage, Pub/Sub)
• Proficiency in Terraform for infrastructure provisioning and management in production environments
• Experience running containerised workloads with Docker and Kubernetes (GKE) in production
• Experience building and maintaining CI/CD pipelines (GitHub Actions or equivalent)
• Hands-on experience with observability tools — specifically Datadog and/or OpenTelemetry (metrics, logs, traces)
• Programming experience in Go and scripting experience in Bash for automation and tooling
Nice to Have
• Hands-on experience with SRE practices: SLO-driven operations, error budgets, and reliability reviews
• Experience building internal developer platforms or platform engineering initiatives
• Business-level Japanese proficiency (JLPT N3 or equivalent) for collaboration with Japan-based teams
• Experience applying AI/ML tools to enhance SRE automation or incident response
• Open-source contributions or experience mentoring engineers on SRE/DevOps practices
What We Offer
• Opportunity to own and shape the entire platform infrastructure for a globally scaled research platform
• Work with a modern, cloud-native stack (GCP, Terraform, Datadog, Go) in an agile engineering culture
• Exposure to large-scale consumer data systems serving 4,000+ enterprise clients across Asia-Pacific
• Collaborative, transparent work culture with strong ownership and continuous learning
• Competitive compensation with growth opportunities in a company backed by $848M in funding
Job ID: 146407305