We are looking for a Senior Big Data Engineer with deep experience in building scalable, high-performance data processing pipelines using Snowflake (Snowpark) and the Hadoop ecosystem. You'll design and implement batch and streaming data workflows, transform complex datasets, and optimize infrastructure to power analytics and data science solutions.
Key Responsibilities:
- Design, develop, and maintain end-to-end scalable data pipelines for high-volume batch and real-time use cases.
- Implement advanced data transformations using Spark, Snowpark, Pig, and Sqoop.
- Process large-scale datasets from varied sources using tools across the Hadoop ecosystem.
- Optimize data storage and retrieval in HBase, Hive, and other NoSQL stores.
- Collaborate closely with data scientists, analysts, and business stakeholders to enable data-driven decision-making.
- Ensure data quality, integrity, and compliance with enterprise security and governance standards.
- Tune and troubleshoot distributed data applications for performance and efficiency.
Must-Have Skills:
- 5+ years in Data Engineering or Big Data roles
- Expertise in:
- Snowflake (Snowpark)
- Apache Spark
- MapReduce, Hadoop
- Sqoop, Pig, HBase
- Strong knowledge of:
- ETL/ELT pipeline design
- Distributed computing principles
- Big Data architecture & performance tuning
- Proven experience handling large-scale data ingestion, processing, and transformation
Nice-to-Have Skills:
- Workflow orchestration with Apache Airflow or Oozie
- Cloud experience: AWS, Azure, or GCP
- Proficiency in Python or Scala
- Familiarity with CI/CD pipelines, Git, and DevOps environments
Soft Skills:
- Strong problem-solving and analytical mindset
- Excellent communication and documentation abilities
- Ability to work independently and within cross-functional Agile teams