Role Summary
We are looking for a skilled Scala / PySpark Developer with hands-on experience in Hadoop ecosystems and CI/CD GitOps practices. The ideal candidate will design, build, and optimize large-scale data processing solutions and contribute to reliable, automated deployment pipelines. Exposure to Kafka, AKHQ, Prometheus, Grafana, and Airflow will be an added advantage.
Key Responsibilities
- Design, develop, and maintain scalable data pipelines and batch processing frameworks using Scala and PySpark.
- Work extensively with Hadoop ecosystem components for distributed data storage and processing.
- Optimize Spark jobs for performance, scalability, and fault tolerance across large datasets.
- Build, manage, and enhance CI/CD pipelines following GitOps principles for automated deployments and environment consistency.
- Collaborate with data engineers, architects, DevOps teams, and business stakeholders to deliver robust data solutions.
- Perform code reviews, unit testing, debugging, and production support for data applications.
- Ensure adherence to coding standards, security practices, and operational excellence.
Required Skills & Qualifications
- Strong hands-on experience in Scala and PySpark development.
- Solid experience working with Hadoop and distributed data processing frameworks.
- Hands-on exposure to CI/CD tools and GitOps-based deployment practices.
- Good understanding of data engineering concepts, ETL/ELT pipelines, and big data architecture.
- Experience in troubleshooting, performance tuning, and production support.
- Strong problem-solving, communication, and collaboration skills.
Good to Have
- Knowledge of Kafka and event-driven/data streaming architectures.
- Exposure to monitoring and observability tools such as AKHQ, Prometheus, and Grafana.
- Experience with workflow orchestration tools like Apache Airflow.
- Familiarity with cloud platforms and containerized deployments is a plus.
Preferred Experience
Typically, 3+ years of relevant experience in big data engineering / data platform development.
Mandatory Skills
SCALA, Pyspark, Kafka
Desirable Skills
Prometheus, Grafana, Airflow, Docker, Kubernetes