Job Summary
Seeking an experienced
Cloudera Architect / Technical Lead with 8+ years of experience in
Cloudera CDP, PySpark, Python, Hadoop, AWS, Oozie, and Control-M to design and lead scalable enterprise data solutions, optimize data platforms, and provide technical leadership.
Key Responsibilities
- Design and implement scalable data pipelines using Cloudera CDP/CDH, PySpark, and Python.
- Architect and optimize data processing solutions on the Hadoop ecosystem (HDFS, Hive, YARN).
- Manage data movement and storage across AWS S3 and HDFS.
- Develop and maintain workflow orchestration using Oozie and Control-M.
- Build FastAPI-based services for data and operational requirements.
- Ensure platform performance, scalability, security, and reliability.
- Lead architecture reviews, solution design, and technical governance activities.
- Collaborate with stakeholders and mentor development teams.
Required Skills
- Strong experience with Cloudera CDP/CDH and Hadoop ecosystem (HDFS, Hive, YARN, Impala).
- Hands-on expertise in PySpark, Spark SQL, and Python.
- Experience with AWS services, particularly S3, IAM, and EC2.
- Strong knowledge of Oozie workflow orchestration and Control-M scheduling.
- Experience in designing and managing large-scale Data Lake environments.
- Proficiency in Linux and Shell Scripting.
- Understanding of distributed computing concepts including data skew, shuffle optimization, broadcast joins, and resource tuning.
- Experience with CI/CD tools such as Git, Jenkins, Azure DevOps, or GitHub Actions.
Skills: data,aws,architect,python,oozie,hadoop,cloudera