Role & Responsibilities
- Design, develop, and maintain robust data pipelines using PySpark to process large-scale datasets efficiently.
- Collaborate with Data Analysts and Data Engineers to translate business requirements into technical solutions.
- Optimize Spark applications for performance and scalability across various environments.
- Implement data quality checks and automate data workflows to ensure reliable data processing.
- Debug, troubleshoot, and resolve issues related to data pipelines and Spark applications.
- Document system architecture, data flows, and best practices for future reference and team knowledge sharing.
Skills & Qualifications
- Must-Have
- Proficiency in PySpark and Spark SQL for large-scale data processing
- Strong programming skills in Python
- Hands-on experience with ETL development and data pipeline automation
- Understanding of Big Data ecosystems and distributed computing principles
- Experience with SQL query development and optimization
- Knowledge of Linux/Unix environments and version control tools
- Ability to work on-site in India
- Preferred
- Experience with cloud platforms like AWS or Azure
- Familiarity with other big data tools such as Hadoop, Hive, or Kafka
Benefits & Culture Highlights
- Exposure to cutting-edge big data projects in a fast-growing environment
- Opportunity to work directly with experienced data professionals and industry experts
- Supportive and collaborative team culture promoting continuous learning
Skills: python,sql,apache spark,big data