Develop and Optimize: Design, implement, and optimize robust data processing pipelines using PySpark for large-scale data processing tasks.
Data Transformation: Collaborate with data teams to transform and aggregate data from various sources, ensuring data quality and integrity throughout the process.
Performance Tuning: Analyze and tune Spark applications to improve performance and efficiency, employing best practices for Spark configuration and resource management.
Testing and Validation: Conduct comprehensive testing and validation of data workflows, ensuring the accuracy and reliability of processed data.
Technical Collaboration: Work closely with data engineers, data scientists, and business analysts to gather requirements and provide technical solutions tailored to data analytics needs.
Documentation: Maintain clear documentation of data processing workflows, Spark applications, and best practices to support knowledge sharing within the team.