Role Summary:
We are seeking a skilled Data Engineer to join our dynamic team. In this role, you'll play a critical part in developing and maintaining Big Data platforms (Data Lake, Data Warehouse, and Data Integration) and advanced analytics solutions. You'll collaborate with application architects, business SMEs, and cross-functional teams to design scalable data pipelines and support innovative data-driven projects.
Key Responsibilities:
- Platform Development: Design, develop, and maintain Big Data platforms including Data Lakes, Data Warehouses, and advanced analytics infrastructures.
- Big Data Architecture: Apply hands-on expertise in data architecture, focusing on Data Warehouse Appliances, Open Data Lakes (AWS EMR, HortonWorks), and Data Lake Technologies (AWS S3, Databricks).
- Data Engineering: Develop and manage Spark ETL frameworks, orchestrate data pipelines using Airflow, and support Presto/Trino query development for stakeholders.
- Machine Learning Pipelines: Design, scale, and deploy ML pipelines for advanced analytics use cases.
- Collaboration: Work closely with application architects, data scientists, and business leaders to deliver end-to-end data solutions.
- Mentorship: Lead code and design reviews, promote technical excellence, and mentor junior engineers.
- Continuous Improvement: Identify opportunities to enhance system performance, scalability, and reliability.
Qualifications & Skills:
- Educational Background: Bachelor's or Master's degree in Computer Science, Data Science, Information Technology, or related fields.
- Technical Proficiency:
- Expertise in Big Data frameworks: Spark, Hadoop, Hive, Kafka, EMR
- Strong knowledge of cloud-based Big Data solutions (AWS, GCP, Azure)
- Advanced skills in data processing, orchestration (Airflow), and query optimization (Presto/Trino)
- Proficiency in programming languages: Python, Java, Scala
- Experience with data warehouses, data lakes, and ML platforms (Spark ML, H2O, KNIME)
- Understanding of DevOps practices, CI/CD pipelines, and Agile methodologies
- Preferred Skills:
- Experience with containerization and microservices architecture using Docker and Kubernetes
- Familiarity with Data Science tools and frameworks
- Strong problem-solving mindset with a passion for continuous learning
Personal Attributes:
- Passionate about Big Data and analytics
- Strong analytical and critical thinking skills
- Self-starter with the ability to work independently and in teams
- Excellent communication and interpersonal skills for cross-functional collaboration