Technology->Analytics - Packages->Python - Big Data,Technology->Big Data - Data Processing->Spark,Technology->Big Data - Data Processing->Scala
Data Engineering & Development
- Design, develop, and maintain scalable batch/stream data pipelines using Spark with Scala and Python.
- Implement efficient transformations, joins, aggregations, and data quality checks for large datasets.
- Build reusable frameworks/utilities to standardize pipeline patterns and reduce delivery time. Performance & Reliability
- Tune Spark jobs (partitioning, caching, shuffles, memory/executor settings) to improve performance and cost efficiency.
- Troubleshoot production issues, perform root-cause analysis, and implement preventive fixes.
- Ensure reliability through robust logging, monitoring hooks, and failure-handling strategies. Collaboration & Delivery
- Work with stakeholders to refine requirements and deliver well-documented, production-ready solutions.
- Conduct code reviews, enforce best practices, and mentor team members on Spark/Scala/Python patterns.
- Contribute to CI/CD-friendly development practices including testing, version control, and release readiness.
- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- 5–9 years of experience in data engineering or backend development with strong hands-on delivery ownership.
- Strong proficiency in Python and Apache Spark for large-scale data processing.
- Solid experience with Scala for Spark-based development and production-grade implementations.
- Working knowledge of Hive and SQL-based data querying/processing concepts.