Job Description
Only a solid grounding in computer engineering, Unix, data structures and algorithms would enable you to meet this challenge.
7+ years of experience architecting, developing, releasing, and maintaining large-scale big data platforms on AWS or GCP
Understanding of how Big Data tech and NoSQL stores like MongoDB, HBase/HDFS, ElasticSearch synergize to power applications in analytics, AI and knowledge graphs
Understandingof how data processing models, data location patterns, disk IO, network IO, shuffling affect large scale text processing - feature extraction, searching etc
Expertise with a variety of data processing systems, including streaming, event, and batch (Spark, Hadoop/MapReduce)
5+ years proficiency in configuring and deploying applications on Linux-based systems
5+ years of experience Spark - especially Pyspark for transforming large non-structured text data, creating highly optimized pipelines
Experience with RDBMS, ETL techniques and frameworks (Sqoop, Flume) and big data querying tools (Pig, Hive)
Stickler of world class best practices, uncompromising on the quality of engineering, understand standards and reference architectures and deep in Unix philosophy with appreciation of big data design patterns, orthogonal code design and functional computation models Skills:- Apache Hadoop, PySpark, Python, Design patterns, Data Structures and Algorithms