Technology->Big Data - Data Processing->Spark,Technology->Java->Apache->Scala
Data Engineering & Development
- Design, develop, and maintain Spark-based batch processing pipelines using Scala for large datasets.
- Implement efficient transformations, aggregations, and joins, ensuring correctness and scalability.
- Write optimized SQL for data extraction, validation, and reconciliation across sources and targets. Performance, Quality & Reliability
- Tune Spark jobs (partitioning, caching, shuffles, memory/executor settings) to improve runtime and cost efficiency.
- Build data quality checks and validations to ensure accuracy, completeness, and consistency of outputs.
- Troubleshoot production issues, perform root-cause analysis, and implement preventive fixes. Collaboration & Delivery
- Work with stakeholders to understand data requirements and translate them into technical solutions.
- Participate in code reviews, follow engineering best practices, and contribute to reusable components.
- Document pipelines, logic, and operational runbooks for maintainability and onboarding.
- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- 5–9 years of overall experience with strong hands-on development in Spark and Scala.
- Solid experience writing and optimizing SQL for analytics and data processing use cases.
- Strong understanding of distributed processing concepts, data transformations, and performance considerations.
- Ability to debug and resolve issues in data pipelines with a focus on reliability and quality.