Job Summary
We are seeking a highly skilled Senior Python Data Science Expert with extensive experience in managing and analyzing very large datasets utilizing cloud platforms and implementing data lakehouse architectures. The ideal candidate will have a strong background in data science big data technologies and cloud computing
Responsibilities
- Data Analysis: Perform complex data analysis on large datasets to extract actionable insights and support decision-making processes
- Data Lakehouse Implementation: Design implement and maintain data lakehouse architectures that integrate the benefits of data lakes and data warehouses
- Cloud Integration: Develop and manage data processing systems on cloud platforms such as AWS Google Cloud or Microsoft Azure
- ETL Processes: Design and optimize ETL (Extract Transform Load) processes to ensure efficient data ingestion transformation and storage
- Machine Learning: Develop and deploy machine learning models to solve business problems and enhance data-driven decision-making
- Collaboration: Work closely with cross-functional teams including data engineers data analysts and IT professionals to ensure seamless data operations
- Data Governance: Implement data governance policies to ensure data quality security and compliance with industry standards
- Performance Optimization: Monitor and optimize the performance of data processing systems and machine learning models
- Documentation: Maintain comprehensive documentation of data workflows models and processes
Required Skills
- Experience: 8+ years of hands-on experience in data science and big data technologies.
- Python: Strong expertise in Python programming for data analysis and machine learning.
- Big Data Technologies: Proficiency in big data tools and frameworks such as Hadoop Spark and Kafka
- Cloud Platforms: Extensive experience with cloud platforms like AWS Google Cloud or Microsoft Azure
- Data Lakehouse: Knowledge of data lakehouse architectures and their implementation.
- ETL Processes: Expertise in designing and optimizing ETL processes.
- Machine Learning: Experience with machine learning frameworks such as TensorFlow PyTorch and scikit-learn
- Database Technologies: Proficiency in database technologies such as SQL NoSQL and data warehousing solutions.
- Data Governance: Understanding of data governance principles and practices.
- Collaboration: Strong collaboration skills to work effectively with cross-functional teams.
- Advanced Analytics: Experience with advanced analytics techniques such as predictive modeling clustering and anomaly detection
- Data Visualization: Proficiency in data visualization tools such as Tableau Power BI or matplotlib
- DevOps: Familiarity with DevOps practices for continuous integration and deployment.
- Security: Knowledge of data security best practices and tools.
- Performance Tuning: Experience in performance tuning and optimization of data processing systems.
Experience & Qualifications
- Statistical Analysis: Advanced knowledge of statistical analysis and modeling techniques
- Programming Languages: Expertise in programming and scripting languages such as C++ Java R Scala SAS and SQL
- Unstructured Data: Experience in navigating and extracting important information from unstructured data
- Project Management: Strong project management skills to lead data science projects and initiatives