Create new and maintain existing Scala/Spark jobs for data transformation and aggregation from simple to complex data transformations involving structured and unstructured data.
Produce unit tests for Spark transformations and helper performance optimisation methods.
Develop data processing pipelines, data storage, and management architecture.
Define scalable calculation logic for interactive and batch use cases
Interact with infrastructure and data teams to produce complex analysis across data
You'll be working on a unique and challenging big data ecosystem with a focus on storage efficiency, data security and privacy, scalable and performant queries, expandability and flexibility, etc., and to help better measure the quality of map data.
You will work with engineers to build a big data platform that processes and manages exabytes of data and enables efficient access to that data.
Requirements
Minimum 7 years of working experience in Big Data and Hadoop platforms, and building large-scale distributed systems with high availability.
Experience in developing Spark Applications using Spark RDD API, Hadoop, Spark SQL, Spark Streaming API, Spark MLlib API, and DataFrames APIs
Should have broad knowledge of Spark advantages, Spark workflows, how to write Spark jobs, Spark query tuning and performance optimisation.
Solid in data structures and algorithm basics.
Should have good hands-on experience with any one programming language (Scala/Java 8 - 1st preference, OR Python - 2nd preference OR Java - 3rd preference).
Strong investigative and problem-solving skills.
Data ingestion, optimisation techniques, data transformation and aggregation pipeline design/development knowledge are required.
Experience working on cutting-edge Big Data storage systems and technologies like Hadoop, HDFS, AWS S3 AWS Lambda, Storm/Heron, Cassandra, Apache Kafka, Solr/ElasticSearch, MongoDB, DynamoDB, Postgres, MySQL, etc.