We are looking for a
Senior Data Engineer with deep expertise in
real-time data streaming and distributed data processing to design, build, and scale next-generation data platforms. This role is critical in enabling
event-driven architecture and real-time analytics for mission-critical banking systems, particularly across
risk and compliance functions.
You will collaborate closely with
data architects, platform engineers, and business stakeholders to deliver
low-latency, high-throughput data pipelines that power advanced analytics and decision-making.
Key Responsibilities
- Design, develop, and maintain real-time streaming pipelines using Apache Kafka, PySpark, and Flink
- Build scalable and fault-tolerant event-driven data architectures
- Process high-volume streaming data with low latency and high reliability
- Integrate data from multiple sources into centralized data platforms (Data Lake / Lakehouse)
- Optimize data pipelines for performance, scalability, and cost efficiency
- Ensure data quality, governance, and compliance aligned with banking standards
- Work with cross-functional teams to translate business requirements into technical solutions
- Monitor and troubleshoot streaming jobs and production pipelines
Required Skills & Experience
- 5+ years of experience in Data Engineering
- Strong hands-on experience with:
- PySpark / Spark Streaming
- Apache Kafka (Producers, Consumers, Kafka Streams)
- Apache Flink or other real-time processing frameworks
- Experience building real-time / near real-time data pipelines
- Strong understanding of distributed systems and event-driven architecture
- Proficiency in Python / Java / Scala
- Experience with data lakes, ETL/ELT pipelines, and big data ecosystems
- Knowledge of cloud platforms (AWS / Azure / GCP) is a plus
- Familiarity with banking, risk, or compliance data systems is highly preferred
Preferred Qualifications
- Experience working in financial services or banking domain
- Exposure to data governance, regulatory reporting, or compliance systems
- Knowledge of CI/CD pipelines and DevOps practices for data platforms