Key Responsibilities:
- Dataiku Leadership: Lead data engineering initiatives focusing on leveraging Dataiku's capabilities for data preparation, analysis, visualization, and deploying data-driven solutions.
- Data Pipeline Development: Design, develop, and optimize scalable and robust data pipelines to support business intelligence and advanced analytics projects, including automation of ETL/ELT processes from diverse data sources.
- Data Modeling & Architecture: Apply best practices in data modeling (dimensional, Kimball, Inmon) to create efficient, scalable database architectures ensuring data integrity and performance.
- ETL/ELT Expertise: Implement, manage, and optimize ETL/ELT workflows using various tools to maintain reliable, high-quality data flow and accessibility.
- Gen AI Integration: Explore and implement solutions using LLM Mesh or similar frameworks to integrate Generative AI capabilities into data engineering processes.
- Programming & Scripting: Use Python and SQL extensively for data manipulation, automation, and development of custom data solutions.
- Cloud Platform Deployment: Deploy and manage scalable data solutions on AWS or Azure cloud platforms, leveraging cloud services for performance and cost efficiency.
- Data Quality & Governance: Ensure integration of data sources maintains high-quality, consistent, and accessible data; implement and follow data governance best practices.
- Collaboration & Mentorship: Work closely with data scientists, analysts, and other stakeholders to translate data requirements into effective solutions; mentor junior team members when needed.
- Performance Optimization: Monitor and optimize data pipeline and system performance continuously to meet business needs.
Required Skills & Experience:
- Proficiency in Dataiku for data prep, visualization, and building end-to-end data pipelines and applications.
- Strong expertise in data modeling techniques such as dimensional modeling (Kimball, Inmon).
- Extensive experience with ETL/ELT tools and processes (e.g., Dataiku built-in tools, Apache Airflow, Talend, SSIS).
- Familiarity with LLM Mesh or similar Generative AI frameworks.
- Advanced skills in Python programming and SQL querying for data manipulation and automation.
- Hands-on experience with cloud platforms like AWS or Azure for scalable data deployments.
- Understanding of Generative AI concepts and potential applications.
- Excellent analytical, problem-solving, communication, and interpersonal skills.
Bonus Skills (Nice to Have):
- Experience with big data technologies such as Spark, Hadoop, Snowflake.
- Knowledge of data governance and security best practices.
- Familiarity with MLOps principles and tools.
- Contributions to open-source projects in data engineering or AI.
Education:
- Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related quantitative field.