We are seeking a highly skilled Data Engineer with expertise in ETL, PySpark, AWS, and big data technologies. The ideal candidate will have in-depth knowledge of Apache Spark, Python, and Java programming (Java 8 and above, including Lambda, Streams, Exception Handling, Collections, etc.). This role involves designing and developing scalable data processing pipelines for batch and real-time analytics.
Key Responsibilities
- Develop data processing pipelines using PySpark.
- Create Spark jobs for data transformation and aggregation.
- Optimize query performance using file formats like ORC, Parquet, and AVRO.
- Design scalable pipelines for both batch and real-time analytics.
- Perform data enrichment and integrate with SQL databases.
- Collaborate with cross-functional teams to understand data requirements.
Required Skills
- Expertise in ETL and big data technologies.
- In-depth knowledge of Apache Spark and PySpark.
- Proficiency in Python and Java (Java 8+).
- Hands-on experience with Spring Core, Spring MVC, and Spring Boot.
- Experience with REST APIs.
- Hands-on experience with AWS.
- Strong knowledge of data optimization with file formats like ORC, Parquet, and AVRO.
- Familiarity with SQL databases.