Experience using scheduling tools such as Airflow.
Experience with most of the following technologies (Apache Hadoop, Pyspark, Apache Spark, YARN, Hive, Python, ETL frameworks, Map Reduce, SQL, RESTful services).
Sound knowledge on working Unix/Linux Platform
Hands-on experience building data pipelines using Hadoop components - Hive, Spark, Spark SQL.
Experience with industry standard version control tools (Git, GitHub), automated deployment tools (Ansible & Jenkins) and requirement management in JIRA.
Understanding of big data modelling techniques using relational and non-relational techniques
Experience on debugging code issues and then publishing the highlighted differences to the development team.
Good to have Requirements
Experience with Elastic search.
Experience developing in Java APIs.
Experience doing ingestions.
Understanding or experience of Cloud design patterns
Exposure to DevOps & Agile Project methodology such as Scrum and Kanban