Responsibilities:
Data Exploration and Insights:
- Continuously explore and analyze large datasets from various sources (Excel, databases) to identify opportunities for enhancing data matching logic, including fuzzy logic
- Generate actionable insights to improve data quality within the SCI solution
Data Quality Improvement:
- Perform targeted analyses to identify and address data quality issues
- Propose and implement effective solutions to improve SCI system data integrity
Weekly Playback and Collaboration:
- Participate in weekly playback sessions using Jupyter Notebook to demonstrate insights
- Incorporate feedback from working groups to refine ongoing analysis
Project Scaling and Support:
- Support the scaling of the SCI project by contributing to data acquisition, cleansing, and validation processes for new markets
- Execute pre-batch and post-batch ingestion analysis and validation
Data Analysis and Validation:
- Conduct in-depth validation of SCI records after batch ingestion
- Identify quality gaps and implement improvements proactively
Stakeholder Collaboration:
- Work closely with business stakeholders to validate records requiring manual intervention
- Communicate analytical findings and recommended actions clearly and effectively
Technical Requirements:
- 5+ years of experience as a Data Scientist
- Strong proficiency in Python and SQL
- Hands-on experience using Jupyter Notebook for analysis and visualization
- Solid understanding of data matching techniques, including fuzzy logic
- Experience with large datasets from multiple data sources (Excel, databases)
- In-depth knowledge of data quality principles and methodologies
Skills:
SQL, Machine Learning, Data Analysis, Jupyter Notebook, Data Cleansing, Fuzzy Logic, Python, Data Quality Improvement, Data Validation, Data Acquisition, Communication and Collaboration, Problem-solving, Analytical skills
Preferred Qualifications (Optional):
- Experience with dedicated data quality tools and frameworks
- Familiarity with cloud platforms such as AWS, Azure, or GCP
- Experience with data visualization tools (e.g., Tableau, Power BI)
- Knowledge of statistical modeling and machine learning algorithms