Senior Data Platform Engineer (Data Lake)

gobblecube

Gurugram, Gurugram, India

4-7 Years

Save

Posted 2 days ago
Be among the first 10 applicants

Early Applicant

Job Description

GobbleCube is an agentic growth operating system designed to help brands scale profitably across digital marketplaces. Founded in November 2022 by the ex- core team of Blinkit, the platform went live commercially from private beta in September 2024.

Since then, GobbleCube has scaled rapidly, supporting 400+ brands across enterprise and D2C, growing revenue 10X, and onboarding 45 of India's largest CPG enterprises in just the last 9 months, including HUL, Nivea, Tata Consumer Products, ITC, Godrej, Beiersdorf, MTR, L'Oréal, Hershey's, and many more.

As commerce becomes more distributed and hyperlocal, GobbleCube is building agentic solutions to help brands navigate complexity in real time, across visibility, performance marketing, supply chain, and growth strategy.

Headquartered in Gurugram, the company supports brands across India, MENA, and LATAM.

Why this role exists

We run a large analytics platform: 200TB+ of active storage today, sitting on a proprietary storage system that we are actively moving off. We are building an open lakehouse on AWS to replace it, and we need someone who can own that migration at this scale, not just contribute to it.

This is a high-ownership role. You will lead the data lake migration work and own the platform that comes out of it: its architecture, its cost, its security, and its reliability. You will work directly with engineering leadership and have a real say in how the platform is built.

What you will own

Lead the build-out of our lakehouse on AWS end to end, from architecture and sequencing through to rollout.
Architect the platform for scale: storage layout and partitioning, table format and conventions, compaction and file-sizing strategy across 200TB+ of active data.
Own platform cost. Build the cost model, drive infra spend down, and manage compute efficiency so the platform scales without scaling spend linearly.
Own security and governance for the platform: access control, network isolation, encryption, and keeping us aligned with our SOC 2 and ISO 27001 obligations.
Own reliability and maintenance: data quality gates, validation pipelines, SLAs, observability, and making the platform robust enough that the rest of engineering can build on it without thinking twice.
Design for multi-tenancy across our quick-commerce and CPG customers, balancing isolation, performance and cost.
Set and enforce shared data standards and conventions, and raise the bar for the engineers around you.
Partner with PMs, data and product teams to turn ambiguous business problems into scalable pipelines.

What we are looking for

4-5+ years of industry experience, with a strong chunk of it in data platform or data lake work.
Bachelor's and/or Master's in CS, or equivalent experience, with solid CS fundamentals (data structures, algorithms, complexity, system design).
Deep hands-on experience with Spark and a modern open table format (Iceberg, Delta or Hudi) on a cloud platform, AWS preferred.
Real experience operating data at scale (multi-terabyte) on a cloud data warehouse or data lake.
Strong programming ability in Python and Go.
A track record of owning infrastructure cost and making meaningful, measurable reductions.
Exposure to data security and governance: IAM, network controls, access policies, and ideally working within a compliance framework like SOC 2 or ISO 27001.
Comfort with ambiguity and the ability to communicate clearly across team boundaries.
A product and customer-first mindset. You care that what you build actually serves end users.

Nice to have

Experience with orchestration systems (Dagster, Temporal or Airflow)
Experience with dbt.
Experience running data infrastructure on Kubernetes.
ClickHouse, Postgres at scale, or other OLAP / OLTP systems.
Experience in quick-commerce, retail, or e-commerce data.
Experience building or scaling a lakehouse, or leading a platform program at scale.

Our tech stack

Python · Go · Spark · dbt · Dagster · Postgres · Snowflake · AWS