
Search by job, company or skills
Analytics Engineer
Location: Mumbai. Full-time
Experience: 24 years in fintech, NBFC, banking, inside sales, tele-sales, or customer success
About Apollo Finvest
We are a publicly listed tech first NBFC think AWS, but for lending. Armed with advanced APIs and capital, we team up with the best fintechs to offer digital loans across the country. It's where finance meets tech, with a splash of innovation, with a strong focus on innovation and customer experience.
Check out Apollo's journey here!
Apollo Cash is our digital personal loan app designed to provide fast, seamless access to credit entirely through a mobile journey.
Role Summary
This role will be responsible for owning the end-to-end data-structuring layer across the organisation. The individual will transform large volumes of raw, unstructured, and semi-structured data (such as SMS, device, bureau, and app data) into clean, standardised, and analysis-ready datasets. These structured datasets will directly power risk analytics, fraud detection, marketing insights, collections strategy, and policy decisioning.
Key Objective of the Role
Ensure all raw lending data (SMS, Bureau, Device, AA, App logs) is captured, parsed,
structured, and stored in a clean analytics-ready format inside databases (PostgreSQL,
DynamoDB, AWS stack) so that the Risk and Data Science team can directly use it for feature
creation, policy building, and portfolio monitoring.
Core Responsibilities
1. End-to-End Data Ownership
Design, build, and maintain end-to-end data pipelines (batch + streaming) using AWS native services (Glue, Lambda, Step Functions, Kinesis, S3, Athena, Redshift, EMR/Spark, etc.) : ingestion parsing structuring storage
Work closely with Tech, Product, and Data Science to define what data should be captured
Maintain data documentation, data dictionaries, and schema governance
Ensure data quality, consistency, and version control
2. Unstructured Data Processing (Highest Priority)
Parse raw SMS dumps and categorise into salary, EMI, loan apps, collections, credits,debits, OTP, etc.
Process device fingerprint, behavioural logs, and vendor data (FinBox, AA, Bureau APIs)
Convert JSON, logs, and raw API responses into structured feature tables
Build regex/keyword-based parsers for financial SMS classification
3. Feature Implementation (From Risk &;Data Science Team)
Implement feature creation logic provided by Risk/Data Science team
Translate business and policy logic into SQL/Python pipelines
Create reusable feature layers for underwriting, fraud, collections, and monitoring
Maintain a feature store for consistent model and policy usage
4. Lending Data Understanding (Domain-Specific Requirement)
Work with Bureau data
Structure SMS-derived financial variables (income, stress, EMI signals)
Work with Account Aggregator and bank transaction datasets
Understand fintech alternate data used in underwriting and fraud detection
5. Data Pipelines & Automation
Build and maintain ETL/ELT pipelines using Python & SQL
Create cron jobs for automated data ingestion and feature refresh
Automate vendor data pulls (Bureau, SMS SDK, AA, device data)
Ensure low-latency pipelines for real-time underwriting use cases
6. Database Structuring & Storage Architecture
Structure clean datasets in PostgreSQL (analytics layer)
Manage raw data storage in DynamoDB / S3 data lake
Design normalized and denormalised tables for risk analytics
Optimise database performance for large-scale query workloads
7. Dashboards & Readable Data Layer
Create analytics-ready datasets, implement & write Metabase queries and convert it into dashboards (Metabase / Power BI )
Enable self-serve data access for Risk, Business, and Founders
Support ad-hoc analysis requirements from leadership
8. Cross-Functional Collaboration (Very Important)
The role requires close collaboration with data science, tech, product, and business teams to ensure reliable data pipelines, well-defined schemas, API integrations,logging architecture and high data quality, enabling faster and more accurate decision-making across lending workflows.
Tech Stack (Current Environment)
AWS SERVICES
PostgreSQL (Primary analytics DB)
DynamoDB (Raw/NoSQL storage)
Python (Pandas, NumPy, ETL frameworks)
Advanced SQL
APIs, JSON, and Log Data Handling
Must-Have Skills
26 years experience in Data Engineering / Analytics Engineering / Fintech Data roles
Strong Python and SQL (production level)
Experience handling unstructured data (SMS, logs, JSON, APIs)
Experience building data pipelines, schedulers, and cron jobs
Strong database design and data modelling skills
Ability to work in a startup environment with high ownership
Familiarity with modern platforms like Snowflake, Google BigQuery, or Amazon Redshift
Good to Have (Highly Preferred)
Experience in Lending / NBFC / Fintech domain
Experience working with Bureau, SMS, Device, or Banking data
Experience with streaming (Kafka/Kinesis) and orchestration (Airflow or Step Functions)
Experience with feature stores and risk analytics datasets
Knowledge of regex, NLP basics for SMS parsing
Experience supporting real-time decision engines / underwriting systems
Why This Role is Mission-Critical for Apollo Cash :
Apollo Cash captures raw SMS, Device, Bureau, AA, and App behavioural data at scale.
Without proper structuring and pipelines, risk models, fraud rules, dashboards, and policies
cannot function effectively. This role will be the single owner responsible for transforming
raw lending data into a clean, usable, and production-ready data layer powering
underwriting, fraud detection, portfolio monitoring, and business analytics.
Job ID: 143979017