Backend Engineer, Data Infrastructure

finrep ai

Bengaluru, India

2-4 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

About Finrep

Finrep is an AI-powered SEC financial reporting platform used by public companies to prepare, review, and file 10-Qs, 10-Ks, and other disclosure documents. We are seed-stage, backed by Accel, and serve 10+ public company customers today. The engineering team is small and high-ownership. Every person here builds core infrastructure, not periphery.

The Role

We are hiring a Data Engineer to own the data infrastructure that powers Finrep's core product: ingestion and parsing of SEC filings, XBRL taxonomies, and financial disclosure documents; ETL orchestration; search indexing; and data synchronization across systems.

You will work with complex, semi-structured financial data (not clean relational rows) and build pipelines that need to be reliable, idempotent, and observable. This is a founding-level role. You will shape how Finrep's data layer scales from here.

What You Will Do

Build and maintain backend services using Python, Django, and Django REST Framework
Design and operate ETL pipelines on Prefect, deployed on GCP Cloud Run
Build ingestion and parsing pipelines for SEC 10-Q/10-K filings, XBRL taxonomies, and disclosure documents
Build and optimize search infrastructure on OpenSearch: indexing, relevance tuning, query performance
Design and maintain CDC pipelines for data synchronization between PostgreSQL and OpenSearch
Work with Pub/Sub and Celery for async processing, background jobs, and task orchestration
Instrument observability, improve reliability, and optimize cost across data infrastructure

What We Are Looking For

2 to 4 years of hands-on backend engineering experience in Python
Experience with Django and Django REST Framework
Solid understanding of PostgreSQL: schema design, query optimization, indexing
Experience building ETL pipelines, workflow orchestration, or background job systems
Understanding of retries, dead-letter queues, idempotency, and failure handling in async systems
Experience with cloud infrastructure on GCP or AWS (Cloud Run, Cloud Functions, or equivalent)
Comfort working with semi-structured or document-oriented data (XML, HTML, nested hierarchies), not just flat tables
Strong debugging instincts and a default-to-ownership mindset

Bonus

Experience with OpenSearch or Elasticsearch
Experience with Prefect
Experience with CDC or event-driven data sync pipelines
Familiarity with SEC filings, XBRL, or financial reporting data
Exposure to working alongside agentic systems

Tech Stack

Python, Django, DRF, PostgreSQL, OpenSearch, Prefect, GCP Cloud Run, Pub/Sub, Celery, Docker

Why This Role

You get founding-level ownership of data infrastructure at an AI company solving a real, underserved problem in public company financial reporting. The domain is complex, the data is non-trivial, and the surface area is wide enough that you will not run out of hard problems.