Analytics Engineer

Apollo Finvest

Mumbai, India

2-6 Years

Save

Posted 2 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Analytics Engineer

Location: Mumbai. Full-time

Experience: 24 years in fintech, NBFC, banking, inside sales, tele-sales, or customer success

About Apollo Finvest

We are a publicly listed tech first NBFC think AWS, but for lending. Armed with advanced APIs and capital, we team up with the best fintechs to offer digital loans across the country. It's where finance meets tech, with a splash of innovation, with a strong focus on innovation and customer experience.

Check out Apollo's journey here!

Apollo Cash is our digital personal loan app designed to provide fast, seamless access to credit entirely through a mobile journey.

Role Summary

This role will be responsible for owning the end-to-end data-structuring layer across the organisation. The individual will transform large volumes of raw, unstructured, and semi-structured data (such as SMS, device, bureau, and app data) into clean, standardised, and analysis-ready datasets. These structured datasets will directly power risk analytics, fraud detection, marketing insights, collections strategy, and policy decisioning.

Key Objective of the Role

Ensure all raw lending data (SMS, Bureau, Device, AA, App logs) is captured, parsed,

structured, and stored in a clean analytics-ready format inside databases (PostgreSQL,

DynamoDB, AWS stack) so that the Risk and Data Science team can directly use it for feature

creation, policy building, and portfolio monitoring.

Core Responsibilities

1. End-to-End Data Ownership

Design, build, and maintain end-to-end data pipelines (batch + streaming) using AWS native services (Glue, Lambda, Step Functions, Kinesis, S3, Athena, Redshift, EMR/Spark, etc.) : ingestion parsing structuring storage

Work closely with Tech, Product, and Data Science to define what data should be captured

Maintain data documentation, data dictionaries, and schema governance

Ensure data quality, consistency, and version control

2. Unstructured Data Processing (Highest Priority)

Parse raw SMS dumps and categorise into salary, EMI, loan apps, collections, credits,debits, OTP, etc.

Process device fingerprint, behavioural logs, and vendor data (FinBox, AA, Bureau APIs)

Convert JSON, logs, and raw API responses into structured feature tables

Build regex/keyword-based parsers for financial SMS classification

3. Feature Implementation (From Risk &;Data Science Team)

Implement feature creation logic provided by Risk/Data Science team

Translate business and policy logic into SQL/Python pipelines

Create reusable feature layers for underwriting, fraud, collections, and monitoring

Maintain a feature store for consistent model and policy usage

4. Lending Data Understanding (Domain-Specific Requirement)

Work with Bureau data

Structure SMS-derived financial variables (income, stress, EMI signals)

Work with Account Aggregator and bank transaction datasets

Understand fintech alternate data used in underwriting and fraud detection

5. Data Pipelines & Automation

Build and maintain ETL/ELT pipelines using Python & SQL

Create cron jobs for automated data ingestion and feature refresh

Automate vendor data pulls (Bureau, SMS SDK, AA, device data)

Ensure low-latency pipelines for real-time underwriting use cases

6. Database Structuring & Storage Architecture

Structure clean datasets in PostgreSQL (analytics layer)

Manage raw data storage in DynamoDB / S3 data lake

Design normalized and denormalised tables for risk analytics

Optimise database performance for large-scale query workloads

7. Dashboards & Readable Data Layer

Create analytics-ready datasets, implement & write Metabase queries and convert it into dashboards (Metabase / Power BI )

Enable self-serve data access for Risk, Business, and Founders

Support ad-hoc analysis requirements from leadership

8. Cross-Functional Collaboration (Very Important)

The role requires close collaboration with data science, tech, product, and business teams to ensure reliable data pipelines, well-defined schemas, API integrations,logging architecture and high data quality, enabling faster and more accurate decision-making across lending workflows.

Tech Stack (Current Environment)

AWS SERVICES

PostgreSQL (Primary analytics DB)

DynamoDB (Raw/NoSQL storage)

Python (Pandas, NumPy, ETL frameworks)

Advanced SQL

APIs, JSON, and Log Data Handling

Must-Have Skills

26 years experience in Data Engineering / Analytics Engineering / Fintech Data roles

Strong Python and SQL (production level)

Experience handling unstructured data (SMS, logs, JSON, APIs)

Experience building data pipelines, schedulers, and cron jobs