We are seeking a PySpark Developer with IT experience. The ideal candidate will possess strong PySpark knowledge and hands-on experience in SQL, HDFS, Hive, Spark, PySpark, and Python. You will be instrumental in developing and optimizing data pipelines, working with large datasets, and implementing data processing solutions, particularly within the Azure Databricks environment.
Key Responsibilities
- PySpark Development: Design, develop, and maintain robust and scalable data processing solutions using PySpark and Python.
- Data Lake & Warehousing: Work with large datasets stored in HDFS and Hive, applying concepts of Partitions and Bucketing for optimized data storage and retrieval.
- SQL & Data Processing: Utilize SQL and PySpark for efficient data manipulation, transformation, and processing.
- Azure Databricks: Develop and deploy solutions on Databricks with Azure, including PySpark notebook development.
- ETL & Data Pipelines: Build and optimize data pipelines, demonstrating a good understanding of Hadoop and Spark architectures.
- SCD Implementation: Get involved in SCD (Slowly Changing Dimension) Type 1 and Type 2 implementation.
- Collaboration: Work closely with data scientists and other engineers, contributing to data preparation for analytical and machine learning models.
- Performance Optimization: Ensure the performance and efficiency of data processing jobs.
Required Skills and Experience
- 8+ years of total IT experience, with 5+ years of relevant work experience as a data engineer/developer.
- Strong PySpark knowledge and hands-on development experience.
- Proficient in SQL, HDFS, Hive, Spark, PySpark, and Python.
- Good understanding of Hadoop and Spark architectures.
- Good understanding of Partitions and Bucketing concepts in Hive.
- Good understanding of data and data processing using SQL or PySpark.
- Good experience in writing code on Python and PySpark.
- Good experience on Databricks with Azure and PySpark notebook development.
- Involved in SCD Type 1 and Type 2 implementation.
Mandatory Skills
- PySpark Developer
- Azure Stack