Teamware Solutions is seeking a skilled professional for the BigData and Hadoop Ecosystems Engineer role. This position is crucial for designing, building, and maintaining scalable big data solutions. You'll work with relevant technologies, ensuring smooth data operations, and contributing significantly to business objectives through expert analysis, development, implementation, and troubleshooting within the BigData and Hadoop Ecosystems domain.
Roles and Responsibilities:
- Big Data Platform Management: Install, configure, and maintain components of the Hadoop ecosystem (e.g., HDFS, YARN, Hive, Spark, Kafka, HBase) to ensure optimal performance, scalability, and high availability.
- Data Pipeline Development: Design, develop, and implement robust and efficient data pipelines for ingestion, processing, and transformation of large datasets using tools like Apache Spark, Hive, or Kafka.
- Performance Tuning: Monitor the performance of big data clusters and applications. Identify bottlenecks and implement optimization strategies for Spark jobs, Hive queries, and other big data processes.
- Data Lake/Warehouse Design: Contribute to the design and implementation of data lake and data warehouse solutions leveraging Hadoop-based technologies.
- ETL/ELT Processes: Develop and manage complex ETL/ELT processes to integrate data from various sources into the big data ecosystem.
- Troubleshooting: Perform in-depth troubleshooting, debugging, and resolution for complex issues within the Hadoop ecosystem, including cluster stability, data processing failures, and performance degradation.
- Security & Governance: Implement and maintain security best practices for big data platforms, including access control, encryption, and data governance policies.
- Automation: Develop scripts and automation routines for cluster management, deployment, monitoring, and routine operational tasks within the big data environment.
- Collaboration: Work closely with data scientists, data analysts, application developers, and infrastructure teams to support data-driven initiatives.
Preferred Candidate Profile:
- Hadoop Ecosystem Expertise: Strong hands-on experience with core components of the Hadoop ecosystem (HDFS, YARN) and related technologies like Apache Spark, Hive, Kafka, HBase, or Presto.
- Programming/Scripting: Proficient in programming languages commonly used in big data, such as Python, Scala, or Java. Strong scripting skills for automation.
- SQL Proficiency: Excellent proficiency in SQL for data manipulation and querying in big data environments (e.g., HiveQL, Spark SQL).
- Cloud Big Data (Plus): Familiarity with cloud-based big data services (e.g., AWS EMR, Azure HDInsight, Google Cloud Dataproc) is a plus.
- Distributed Systems: Understanding of distributed computing principles and challenges in managing large-scale data systems.
- Problem-Solving: Excellent analytical and problem-solving skills with a methodical approach to complex big data challenges.
- Communication: Strong verbal and written communication skills to articulate technical concepts and collaborate effectively with diverse teams.
- Education: Bachelor's degree in Computer Science, Data Engineering, Information Technology, or a related technical field.