We are seeking an experienced Big Data Engineer to design and maintain scalable data processing systems and pipelines across large-scale, distributed environments. This role requires deep expertise in tools such as Snowflake (Snowpark), Spark, Hadoop, Sqoop, Pig, and HBase. You will work closely with data scientists and stakeholders to transform raw data into actionable intelligence and power analytics platforms.
Key Responsibilities:
- Design and develop high-performance, scalable data pipelines for batch and streaming processing.
- Implement data transformations and ETL workflows using Spark, Snowflake (Snowpark), Pig, Sqoop, and related tools.
- Manage large-scale data ingestion from various structured and unstructured data sources.
- Work with Hadoop ecosystem components including MapReduce, HBase, Hive, and HDFS.
- Optimize storage and query performance for high-throughput, low-latency systems.
- Collaborate with data scientists, analysts, and product teams to define and implement end-to-end data solutions.
- Ensure data integrity, quality, governance, and security across all systems.
- Monitor, troubleshoot, and fine-tune the performance of distributed systems and jobs.
Must-Have Skills:
- Strong hands-on experience with:
- Snowflake & Snowpark
- Apache Spark
- Hadoop, MapReduce
- Pig, Sqoop, HBase, Hive
- Expertise in data ingestion, transformation, and pipeline orchestration
- In-depth knowledge of distributed computing and big data architecture
- Experience in data modeling, storage optimization, and query performance tuning