Key Responsibilities:
- Dataiku Leadership:
- Drive data engineering projects with a strong emphasis on utilizing Dataiku for data preparation, analysis, visualization, and building end-to-end data solutions.
- Data Pipeline Development:
- Design, develop, and optimize robust, scalable data pipelines supporting business intelligence and advanced analytics initiatives. Build and maintain ETL/ELT processes across diverse data sources.
- Data Modeling & Architecture:
- Apply data modeling best practices (dimensional, star schema, etc.) to create scalable and efficient database designs with a focus on performance and integrity.
- ETL/ELT Expertise:
- Build and manage ETL pipelines using tools such as Dataiku, Apache Airflow, Talend, or SSIS to ensure efficient, reliable, and high-quality data flow.
- Gen AI Integration:
- Explore and integrate Generative AI solutions leveraging LLM Mesh or similar frameworks to enhance platform capabilities.
- Programming & Scripting:
- Use Python and SQL extensively for data manipulation, automation, and the development of custom data applications.
- Cloud Platform Deployment:
- Deploy and manage data solutions on cloud platforms such as AWS or Azure, using services like S3, EC2, Azure Data Lake, or Synapse Analytics.
- Data Quality & Governance:
- Ensure high-quality, well-governed data through integration, monitoring, and adherence to governance standards and best practices.
- Collaboration & Mentorship:
- Work closely with data scientists, analysts, and business teams to gather requirements and deliver impactful data products. Mentor junior team members as needed.
- Performance Optimization:
- Monitor and enhance data pipeline and system performance to meet evolving business and technical needs.
Required Skills & Experience:
- Demonstrated expertise in using Dataiku for data pipeline development and visualization
- Strong hands-on experience with data modeling techniques and best practices
- Proficient in building and managing ETL/ELT processes using tools like Dataiku, Apache Airflow, Talend, or SSIS
- Familiarity with LLM Mesh or related Gen AI integration frameworks
- Strong programming skills in Python and SQL for scripting, analysis, and automation
- Hands-on experience deploying data solutions on AWS or Azure cloud platforms
- Solid understanding of Generative AI concepts and their application in data workflows
- Excellent problem-solving and analytical thinking skills
- Effective communication and collaboration abilities
Bonus Points (Nice to Have):
- Experience with big data technologies such as Spark, Hadoop, or Snowflake
- Knowledge of data governance and security frameworks
- Experience with MLOps practices and tooling
- Contributions to open-source projects in the data or AI domain
Education:
- Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related quantitative field