Key Responsibilities
- Build, deploy, and manage data pipelines using Python and PySpark
- Develop and optimize ETL/ELT processes to support data integration across systems
- Work directly with GCP and AWS services to implement scalable cloud-based data solutions
- Own data workflows end-to-end from ingestion to transformation to storage
- Continuously monitor and improve pipeline reliability, speed, and data quality
- Use GenAI and automation tools to speed up development and reduce manual effort
- Proactively debug, troubleshoot, and resolve data engineering issues
- Ensure data is available and trustworthy for analytics and downstream systems
- Deliver high-quality code and documentation with a bias for action
Required Skills & Qualifications
- Strong proficiency in Python and PySpark
- Working experience with cloud platforms like GCP and/or AWS
- Solid understanding of ETL/ELT, data warehouse and data lake concepts.
- Proficient in working with relational databasespreferably PostgreSQL and MySQLas
well as non-relational databases, with a focus on MongoDB.
- Driven by delivery and results you get things done efficiently
- Self-starter attitude with minimal need for hand-holding
- Excitement for automating work using GenAI or scripting tools
- Familiarity with SCD, CDC, Real-time Streaming vs Batch-processing.
Nice to Have
- Experience with CI/CD pipelines and Docker
- Understanding of data governance and observability
- Prior experience in fast-paced, execution-heavy teams
Why Join Us
- High-impact role where execution speed is valued and recognised
- Freedom to build, ship, and iterate without red tape
- Work with a lean, high-performing data team
- Opportunities to innovate with Generative AI tools
- Fast learning environment and ownership from day one
If you're someone who prefers delivering over deliberating apply now and help us build data
infrastructure that moves the needle