About the position
As a Senior Data Engineer in the MIDAS (Management Integration & Data Analytics System) Data Platform Team, you will build from scratch and maintain the central data hub connecting most systems found inside one of Japan's more innovative digital banks.
You will work with modern cloud-based data technologies to ingest data from various banking systems, apply complex business logic on it, then serve it to downstream systems for enterprise management, regulatory reporting, risk management and many other applications.
Thanks to the high expectations towards the banking domain, you will have the opportunity to work on complex data engineering challenges including data quality, reconciliation across multiple systems, time-critical data processing, and complete traceability.
This is a senior individual contributor role where you will design and implement complex data pipelines, mentor mid-level engineers, and participate in architectural decisions for the platform.
This position involves employment with Money Forward, Inc., and a secondment to the new company (SMBC Money Forward Bank Preparatory Corporation). The evaluation system and employee benefits will follow the policies of Money Forward, Inc.
Who we are
We are a startup team partnering with Sumitomo Mitsui Financial Group and Sumitomo Mitsui Banking Corporation to establish a new digital bank. Our mission is to build embedded financial products from the ground up, with a strong focus on supporting small and medium-sized businesses (SMBs).
Development Structure
We operate in a small, agile team while collaborating closely with partners from the banking industry. The MIDAS team is growing rapidly, aiming for more than 10 data engineers within this year.
Technology Stack and Tools Used
- Cloud Infrastructure
- AWS (primary cloud platform in Tokyo region)
- S3 for data lake storage with VPC networking for secure connectivity
- AWS IAM for security and access management
- Data Lakehouse Architecture
- Modern lakehouse architecture using Delta Lake or Apache Iceberg for ACID transactions, time-travel, and schema evolution
- Columnar storage formats (Parquet) optimized for analytics
- Bronze/Silver/Gold medallion architecture for progressive data refinement
- Partition strategies and Z-ordering for query performance
- Orchestration & Processing
- Managed workflow orchestration platforms (Amazon MWAA/Apache Airflow, Databricks Workflows, or similar)
- Distributed data processing with Apache Spark
- Serverless compute options for cost optimization
- Streaming and batch ingestion patterns (AutoLoader, scheduled jobs)
- Data Transformation
- dbt (data build tool) for SQL-based analytics engineering
- Delta Live Tables or AWS Glue for declarative ETL pipelines
- SQL and Python for data transformations
- Incremental materialization strategies for efficiency
- Query & Analytics
- Serverless query engines (Amazon Athena, Databricks SQL, or Redshift Serverless)
- Auto-scaling compute for variable workloads
- Query result caching and optimization
- REST APIs for data serving to downstream consumers
- Data Quality & Governance
- Automated data quality frameworks (AWS Glue Data Quality, Delta Live Tables expectations, Great Expectations)
- Cross-system reconciliation and validation logic
- Fine-grained access control with column/row-level security (AWS Lake Formation or Unity Catalog)
- Automated data lineage tracking for regulatory compliance
- Audit logging and 10-year data retention policies
- Business Intelligence
- Amazon QuickSight and/or Databricks SQL Dashboards
- Integration with enterprise BI tools (Tableau, PowerBI, Looker)
- Development & DevOps
- Languages: SQL (primary), Python
- Version Control: GitHub
- CI/CD: GitHub Actions
- Infrastructure as Code: Terraform
- Monitoring: CloudWatch, Databricks monitoring, or similar
- AI-Assisted Development: Claude Code, GitHub Copilot, ChatGPT
Responsibilities
- Design and implement data pipelines to ingest data from multiple source systems (CBS, CLM, Mambu, LOS) using REST APIs or database connections
- Build and maintain Bronze/Silver/Gold layer transformations ensuring data quality, consistency, and performance
- Implement data quality checks and cross-system reconciliation logic (e.g., validating CBS transactions against Mambu ledger balances)
- Develop and optimize SQL queries and transformations using dbt or similar tools
- Design and implement data models for analytics and reporting use cases (ALM, ERM, regulatory reporting)
- Build REST APIs or data serving layers for downstream consumers
- Participate in architecture decisions for data platform components
- Write unit tests, integration tests, and data quality tests for pipelines
- Monitor data pipeline performance, troubleshoot failures, and implement improvements
- Optimize query performance through partitioning strategies, Z-ordering, and query tuning
- Implement infrastructure as code for data platform components using Terraform
- Set up CI/CD pipelines for automated testing and deployment of data pipelines
- Mentor mid-level engineers and conduct code reviews
- Contribute to documentation and best practices for the team
- Collaborate with backend engineers to define API contracts and data schemas
- Work with Technical Lead on platform design and technology selection decisions
- Lead features and initiatives within the data platform
- Support EOD (End-of-Day) data collection processes that align with Zengin settlement timing
Requirements
- 5+ years of experience in data engineering with data focus, or analytics engineering
- Strong proficiency in SQL and Python
- Hands-on experience building data pipelines using modern tools (Airflow, Spark, dbt, or similar)
- Experience with cloud data platforms (AWS, Azure, GCP) and storage systems (S3, ADLS, GCS)
- Strong understanding of data modeling techniques including dimensional modeling, data vault, or event-driven architectures
- Experience with data quality validation and testing frameworks
- Proven ability to debug and optimize slow queries and data processing jobs
- Experience with version control (Git) and CI/CD pipelines
- Understanding of data governance concepts: access control, audit logging, data lineage
- Strong problem-solving skills and ability to work independently
- Experience mentoring junior or mid-level engineers
- Excellent communication skills for collaborating with cross-functional teams
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
- Language ability: Japanese at Business level and/or English at Business level (TOEIC score of 700 or above)
Nice to haves
While not specifically required, tell us if you have any of the following.
- Experience in financial services, fintech, or other regulated industries
- Knowledge of banking domain concepts: core banking systems, payment processing, regulatory reporting, AML/transaction monitoring
- Experience implementing data platforms that comply with regulatory requirements (FISC Security Guidelines, FSA/BOJ reporting, GDPR, APPI)
- Hands-on experience with Databricks platform or AWS native data services
- Experience implementing cross-system reconciliation for financial data
- Experience with performance tuning: partitioning strategies, query optimization, cost management
- Experience building REST APIs with Python (FastAPI, Flask, or similar) for data serving
- Knowledge of streaming data pipelines (Kafka, Kinesis, or similar)
- Experience with Terraform
- Contributions to open-source data engineering projects
- Experience with BI tools (QuickSight, Tableau, Looker, PowerBI)
- Experience leading technical initiatives from design through implementation
- Track record of improving data platform performance or reducing costs (provide specific metrics)