Analytics Engineer 2 - Data Platform

slice

Bengaluru, India

3-5 Years

Save

Posted 2 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About Us:

slice

A new bank for a new India.

slice's purpose is to make the world better at using money and time, with a major focus on building the best consumer experience for your money. We've all felt how slow, confusing, and complicated banking can be. So, we're reimagining it. We're building every product from scratch to be fast, transparent, and feel good, because we believe that the best products transcend demographics, like how great music touches most of us.

Our cornerstone products and services: slice savings account, slice UPI credit card, slice UPI, and slice business are designed to be simple, rewarding, and completely in your control. At slice, you'll get to build things you'd use yourself and shape the future of banking in India. We tailor our working experience with the belief that the present moment is the only real thing in life. And we have harmony in the present the most when we feel happy and successful together.

We're backed by some of the world's leading investors, including Tiger Global, Insight Partners, Advent International, Blume Ventures, and Gunosy Capital.

About the role:

We are looking for 3-5 years of experienced Analytics Engineers in the Data Platform team. The role has two parallel responsibilities. Building and owning data marts. The business-modelled analytical layer between our raw data lake and every team making decisions with data. Fact tables, dimension tables, SCD management, quality tests, SLA adherence. Not all data marts are the same - some run on S3-backed Delta Lake for historical batch consumption, others on Pinot for near-real-time serving. You understand how the storage layer shapes the model, and you design accordingly.

Contributing to the platform. Reusable Spark pipeline templates, quality framework extensions, onboarding & backfill automation, cost tooling, observability components. When you solve a problem in a pipeline, you ask whether it can become a platform capability. The expectation is that each data mart you build is faster to deliver than the last - because of tools you helped create.

We expect engineers here to use AI tools as a genuine part of how they work - for development, debugging, documentation, and quality. Not occasionally. This is how the team moves fast without cutting corners.

You Should Have:

AI Engineering

Reaching for AI tools first is a habit, not an occasional practice.
Quality: Learned anomaly detection and LLMs to generate assertions from schema and business context - coverage grows with the data, not manual effort
Observability: AI-assisted monitors that learn baseline behaviour; LLMs to enrich lineage metadata and auto-document transformation logic
Development: LLM-assisted coding for generation, test scaffolding, and refactoring - you prompt well, validate critically, and know what needs human review before touching production data

Data Modelling

Deep dimensional modelling: grain, additive vs semi-additive measures, conformed dimensions, SCD patterns, and when star schema is the wrong answer. You know when to normalise vs denormalise based on access patterns, not convention. You understand that storage engine shapes the model - a Delta Lake batch fact table and a Pinot real-time fact table are different designs - and that near-real-time freshness changes the modelling problem meaningfully. Before writing code, you resolve the right questions with stakeholders: grain, shared dimensions, and where denormalisation is justified by query pattern.

SQL & Transformation

You work in a version-controlled transformation framework - dbt or equivalent - and treat data models as software. Materialisation strategies, incremental models, ref-based dependencies, and model-level tests are part of every delivery. PR-based code review is the standard gate before anything reaches production. Every model is documented, tested, and reviewable by someone who wasn't in the room when it was written.

BI Layer Awareness

You design with downstream consumption in mind. A mart that is technically correct but difficult to query in Superset is an incomplete delivery. You validate that what you built answers the question it was meant to answer, and work with analysts to catch consumption friction early.

Apache Spark

You write production PySpark, modular, tested, version controlled. You can read a Spark UI, diagnose OOM and skew, and tune partition count, joins, executor sizing, and file layout deliberately.
You understand the performance implications of file format, compression, and sort order on downstream Trino query cost and you make those choices deliberately, not by default.
Production-grade Python: modular, tested, linted - not scripts that work once. Comfortable with PySpark APIs, Pydantic for config, and pytest for pipeline testing.

Core Tools

Trino - write efficient models that perform well at scale; understand partition pruning, file count impact, and query cost at the model level
Python - production-grade: modular, tested, linted. Pydantic for config, pytest for pipeline testing
dbt or equivalent transformation framework in production

Pipeline Engineering

Idempotency is a design principle, not an afterthought. You have implemented SCD Type 2 or equivalent and can articulate the trade-offs. When issues arise, you diagnose fast feeding query plans and stack traces into AI tools to cut time-to-root-cause, while knowing which outputs to verify before acting.

Data Quality Engineering

Custom quality assertions beyond framework defaults volume checks, distribution comparisons, referential integrity. You can implement anomaly detection on metric time series without a library doing all the work. You apply AI-driven pattern detection where deterministic rules fall short, and use LLMs to generate quality rule suggestions from schema and sample data without skipping the review.

Product and Business Sense

You think in metrics before tables. You ask what question someone is trying to answer before asking what columns they need, and push back on underspecified requirements with precision. Familiarity with fintech metrics - KYC approvals, repayment rates, settlement ratios, disbursement volumes, conversion rates, cohort retention is a strong plus.

Nice to Have

Apache Iceberg or Delta Lake - schema evolution, time travel, compaction
DataHub or Apache Atlas for lineage and data cataloguing
Prior experience in fintech or a regulated environment where data governance had compliance implications

Life at slice:

Life so good, you'd think we're kidding

Competitive salaries. Period.
An extensive medical insurance that looks out for our employees & their dependants. We'll love you and take care of you, our promise.
Flexible working hours. Just don't call us at 3AM, we like our sleep schedule.
Tailored vacation & leave policies so that you enjoy every important moment in your life.
A reward system that celebrates hard work and milestones throughout the year. Expect a gift coming your way anytime you kill it here.
Learning and upskilling opportunities. Seriously, not kidding.
Good food, games, and a cool office to make you feel like home.
An environment so good, you'll forget the term colleagues can't be your friends.