Senior Data Engineer - MIDAS Data Platform

Money Forward India

Chennai, India

5-7 Years

Save

Posted 21 days ago
Be among the first 10 applicants

Early Applicant

Job Description

About the position

As a Senior Data Engineer in the MIDAS (Management Integration & Data Analytics System) Data Platform Team, you will build from scratch and maintain the central data hub connecting most systems found inside one of Japan's more innovative digital banks.

You will work with modern cloud-based data technologies to ingest data from various banking systems, apply complex business logic on it, then serve it to downstream systems for enterprise management, regulatory reporting, risk management and many other applications.

Thanks to the high expectations towards the banking domain, you will have the opportunity to work on complex data engineering challenges including data quality, reconciliation across multiple systems, time-critical data processing, and complete traceability.

This is a senior individual contributor role where you will design and implement complex data pipelines, mentor mid-level engineers, and participate in architectural decisions for the platform.

This position involves employment with Money Forward, Inc., and a secondment to the new company (SMBC Money Forward Bank Preparatory Corporation). The evaluation system and employee benefits will follow the policies of Money Forward, Inc.

Who we are

We are a startup team partnering with Sumitomo Mitsui Financial Group and Sumitomo Mitsui Banking Corporation to establish a new digital bank. Our mission is to build embedded financial products from the ground up, with a strong focus on supporting small and medium-sized businesses (SMBs).

Development Structure

We operate in a small, agile team while collaborating closely with partners from the banking industry. The MIDAS team is growing rapidly, aiming for more than 10 data engineers within this year.

Technology Stack and Tools Used

Cloud Infrastructure

AWS (primary cloud platform in Tokyo region)
S3 for data lake storage with VPC networking for secure connectivity
AWS IAM for security and access management

Data Lakehouse Architecture

Modern lakehouse architecture using Delta Lake or Apache Iceberg for ACID transactions, time-travel, and schema evolution
Columnar storage formats (Parquet) optimized for analytics
Bronze/Silver/Gold medallion architecture for progressive data refinement
Partition strategies and Z-ordering for query performance

Orchestration & Processing

Managed workflow orchestration platforms (Amazon MWAA/Apache Airflow, Databricks Workflows, or similar)
Distributed data processing with Apache Spark
Serverless compute options for cost optimization
Streaming and batch ingestion patterns (AutoLoader, scheduled jobs)

Data Transformation

dbt (data build tool) for SQL-based analytics engineering
Delta Live Tables or AWS Glue for declarative ETL pipelines
SQL and Python for data transformations
Incremental materialization strategies for efficiency

Query & Analytics

Serverless query engines (Amazon Athena, Databricks SQL, or Redshift Serverless)
Auto-scaling compute for variable workloads
Query result caching and optimization
REST APIs for data serving to downstream consumers

Data Quality & Governance

Automated data quality frameworks (AWS Glue Data Quality, Delta Live Tables expectations, Great Expectations)
Cross-system reconciliation and validation logic
Fine-grained access control with column/row-level security (AWS Lake Formation or Unity Catalog)
Automated data lineage tracking for regulatory compliance
Audit logging and 10-year data retention policies

Business Intelligence

Amazon QuickSight and/or Databricks SQL Dashboards
Integration with enterprise BI tools (Tableau, PowerBI, Looker)

Development & DevOps

Languages: SQL (primary), Python
Version Control: GitHub
CI/CD: GitHub Actions
Infrastructure as Code: Terraform
Monitoring: CloudWatch, Databricks monitoring, or similar
AI-Assisted Development: Claude Code, GitHub Copilot, ChatGPT

Responsibilities

Design and implement data pipelines to ingest data from multiple source systems (CBS, CLM, Mambu, LOS) using REST APIs or database connections
Build and maintain Bronze/Silver/Gold layer transformations ensuring data quality, consistency, and performance
Implement data quality checks and cross-system reconciliation logic (e.g., validating CBS transactions against Mambu ledger balances)
Develop and optimize SQL queries and transformations using dbt or similar tools
Design and implement data models for analytics and reporting use cases (ALM, ERM, regulatory reporting)
Build REST APIs or data serving layers for downstream consumers
Participate in architecture decisions for data platform components
Write unit tests, integration tests, and data quality tests for pipelines
Monitor data pipeline performance, troubleshoot failures, and implement improvements
Optimize query performance through partitioning strategies, Z-ordering, and query tuning
Implement infrastructure as code for data platform components using Terraform
Set up CI/CD pipelines for automated testing and deployment of data pipelines
Mentor mid-level engineers and conduct code reviews
Contribute to documentation and best practices for the team
Collaborate with backend engineers to define API contracts and data schemas
Work with Technical Lead on platform design and technology selection decisions
Lead features and initiatives within the data platform
Support EOD (End-of-Day) data collection processes that align with Zengin settlement timing

Requirements

5+ years of experience in data engineering with data focus, or analytics engineering
Strong proficiency in SQL and Python
Hands-on experience building data pipelines using modern tools (Airflow, Spark, dbt, or similar)
Experience with cloud data platforms (AWS, Azure, GCP) and storage systems (S3, ADLS, GCS)
Strong understanding of data modeling techniques including dimensional modeling, data vault, or event-driven architectures
Experience with data quality validation and testing frameworks
Proven ability to debug and optimize slow queries and data processing jobs
Experience with version control (Git) and CI/CD pipelines
Understanding of data governance concepts: access control, audit logging, data lineage
Strong problem-solving skills and ability to work independently
Experience mentoring junior or mid-level engineers
Excellent communication skills for collaborating with cross-functional teams
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
Language ability: Japanese at Business level and/or English at Business level (TOEIC score of 700 or above)

Nice to haves
While not specifically required, tell us if you have any of the following.

Experience in financial services, fintech, or other regulated industries
Knowledge of banking domain concepts: core banking systems, payment processing, regulatory reporting, AML/transaction monitoring
Experience implementing data platforms that comply with regulatory requirements (FISC Security Guidelines, FSA/BOJ reporting, GDPR, APPI)
Hands-on experience with Databricks platform or AWS native data services
Experience implementing cross-system reconciliation for financial data
Experience with performance tuning: partitioning strategies, query optimization, cost management
Experience building REST APIs with Python (FastAPI, Flask, or similar) for data serving
Knowledge of streaming data pipelines (Kafka, Kinesis, or similar)
Experience with Terraform
Contributions to open-source data engineering projects
Experience with BI tools (QuickSight, Tableau, Looker, PowerBI)
Experience leading technical initiatives from design through implementation
Track record of improving data platform performance or reducing costs (provide specific metrics)