Key Skills: Shell Scripting, Pyspark, Linux, Hadoop Ecosystem, UNIX, Python, Big Data
Roles and Responsibilities:
- Administer and maintain Hadoop clusters, ensuring high availability and performance.
- Develop and implement data processing workflows using Pyspark.
- Monitor system performance and troubleshoot issues as they arise.
- Collaborate with data engineers and analysts to optimize data storage and retrieval.
- Write and maintain shell scripts for automation of routine tasks.
- Ensure data security and compliance with organizational policies.
Skills Required:
- Strong expertise in administering Hadoop clusters and related ecosystem components
- Proficiency in Python and Pyspark for developing and optimizing data workflows
- Hands-on experience with Linux/UNIX systems and shell scripting for automation
- Knowledge of Big Data concepts and tools (HDFS, Hive, HBase, Spark)
- Ability to monitor, troubleshoot, and optimize system performance
- Understanding of data security, compliance, and governance best practices
Education: Relevant degree in Computer Science or related field preferred.