Job Description

We are looking for a Data Engineer that will work on the collecting, storing, processing, and analyzing of huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company. This position is for Mumbai - LOWER PAREL office.


Selecting and integrating any Data tools and frameworks required to provide requested capabilities.
Implementing ETL process (If importing data from existing data sources is relevant)
Monitoring performance and advising any necessary infrastructure changes.
Defining data retention policies.
Skills and Qualifications:

Proficient understanding of distributed computing principles.
Management of Hadoop cluster, with all included services.
Ability to solve any ongoing issues with operating the cluster.
Proficiency with Hadoop v2, MapReduce, HDFS.
Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming (If stream-processing is relevant for the role)
Good knowledge of Data querying tools, such as Pig, Hive, and Impala.
Experience with Spark (If you are including or planning to include it)
Experience with integration of data from multiple data sources.
Experience with NoSQL databases, such as HBase
Good understanding of Lambda Architecture, along with its advantages and drawbacks.
Handson Checklist:
Knowledge of PYSPARK
Solid hands-on with SQL queries on Hive. Should have the ability to profile/understand the data
Python and Pandas programming
Hadoop HDFS commands and permissions
Git code management and deployment
Design and implement big data pipelines
Data migration between Hadoop clusters
Good understanding of hive tables partitioning concepts
Good hands-on Unix shell scripting's
Job scheduling
