Intern-Big Data

Segate Technology

Pune

0-1 Years

Save

Posted 13 hours ago
Over 400 applicants

Quick Apply

Job Description

Role Summary:

We are seeking a skilled Data Engineer to join our dynamic team. In this role, you'll play a critical part in developing and maintaining Big Data platforms (Data Lake, Data Warehouse, and Data Integration) and advanced analytics solutions. You'll collaborate with application architects, business SMEs, and cross-functional teams to design scalable data pipelines and support innovative data-driven projects.

Key Responsibilities:

Platform Development: Design, develop, and maintain Big Data platforms including Data Lakes, Data Warehouses, and advanced analytics infrastructures.
Big Data Architecture: Apply hands-on expertise in data architecture, focusing on Data Warehouse Appliances, Open Data Lakes (AWS EMR, HortonWorks), and Data Lake Technologies (AWS S3, Databricks).
Data Engineering: Develop and manage Spark ETL frameworks, orchestrate data pipelines using Airflow, and support Presto/Trino query development for stakeholders.
Machine Learning Pipelines: Design, scale, and deploy ML pipelines for advanced analytics use cases.
Collaboration: Work closely with application architects, data scientists, and business leaders to deliver end-to-end data solutions.
Mentorship: Lead code and design reviews, promote technical excellence, and mentor junior engineers.
Continuous Improvement: Identify opportunities to enhance system performance, scalability, and reliability.

Qualifications & Skills:

Educational Background: Bachelor's or Master's degree in Computer Science, Data Science, Information Technology, or related fields.
Technical Proficiency:
Expertise in Big Data frameworks: Spark, Hadoop, Hive, Kafka, EMR
Strong knowledge of cloud-based Big Data solutions (AWS, GCP, Azure)
Advanced skills in data processing, orchestration (Airflow), and query optimization (Presto/Trino)
Proficiency in programming languages: Python, Java, Scala
Experience with data warehouses, data lakes, and ML platforms (Spark ML, H2O, KNIME)
Understanding of DevOps practices, CI/CD pipelines, and Agile methodologies
Preferred Skills:
Experience with containerization and microservices architecture using Docker and Kubernetes
Familiarity with Data Science tools and frameworks
Strong problem-solving mindset with a passion for continuous learning

Personal Attributes:

Passionate about Big Data and analytics
Strong analytical and critical thinking skills
Self-starter with the ability to work independently and in teams
Excellent communication and interpersonal skills for cross-functional collaboration

Spotlight