Create and maintain optimal data pipeline architecture
Assist with assembling large, complex data sets that meet functional / non-functional
business requirements.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources and technologies eg. GCP.
Work with and manage datasets both internal and external, with details of licensing,
annotation format, classes/tasks, etc.
Work with stakeholders including the Function owner, architects, Leads, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
Work with data and analytics team to strive for greater functionality in our data systems.
Build, grow and retain talent, providing technical leadership and direction
Support for robust ML & CI /CD pipelines automation
Skills
Programming: Python, Pytorch
Machine Learning: Classification, regression, clustering, CNNs and RNNs
Experience building and optimizing big data data pipelines, architectures and data sets
Manipulating, processing and extracting value from large disconnected datasets.
Working knowledge of message queuing, stream processing, and highly scalable big data data stores.
Experience in cloud Automation (GCP/Azure/AWS)
Experience with use of software design patterns and data structures
Expertise in python programming and good coding practices
Excellent analysis & trouble-shooting skills using a structured documented approach
Excellent communication skills, both written and verbal
Strong experience in capitalising knowledge & best practices through creation and/or updating of standard
Strong experience in training and mentoring team members