Key Responsibilities:
Data Exploration and Insights:
- Continuously explore and analyze data to identify opportunities for improving data matching logic, including fuzzy logic techniques.
- Work with large and diverse datasets sourced from Excel files, databases, and other systems.
Data Quality Improvement:
- Analyze data quality issues within the SCI system and propose actionable solutions.
- Implement improvements to enhance overall data integrity and reliability.
Weekly Playback and Collaboration:
- Participate in weekly playback sessions, using Jupyter Notebook to present data insights and analytical findings.
- Incorporate feedback from stakeholders and prioritize follow-up explorations and analyses accordingly.
Project Scaling and Support:
- Support scaling efforts of the SCI project by assisting with data acquisition, cleansing, and validation for new markets.
- Handle pre-batch ingestion preparations and perform post-batch ingestion analysis and validation of SCI records.
Data Analysis and Validation:
- Conduct thorough analysis and validation of SCI records post-batch ingestion to identify patterns and improve data quality.
- Proactively uncover insights and recommend or implement corrective actions.
Stakeholder Collaboration:
- Coordinate with business stakeholders to facilitate manual validation of flagged records.
- Effectively communicate findings, insights, and recommendations to both technical and non-technical audiences.
Technical Requirements:
- Minimum 5 years of experience as a Data Scientist.
- Strong proficiency in Python and SQL.
- Extensive experience using Jupyter Notebook for data analysis and visualization.
- Working knowledge of data matching techniques, especially fuzzy logic.
- Experience handling large datasets from multiple sources, including Excel and relational databases.
- Solid understanding of data quality frameworks and best practices.
Preferred Qualifications:
- Experience with data quality tools and methodologies.
- Familiarity with cloud platforms such as AWS, Azure, or GCP.
- Experience with data visualization tools such as Tableau or Power BI.
- Knowledge of statistical modeling and machine learning algorithms.