Position Overview
An experienced Senior Data Engineer with at least 5 years of hands-on experience in data engineering. The ideal candidate will have a solid understanding of big data technologies and be skilled in building scalable data infrastructure, designing ETL pipelines, and leveraging tools like Hadoop, PySpark, Kafka, and Apache NiFi.
Key Responsibilities
- Design, develop, and maintain large-scale, high-performance data systems and data pipelines using Python, PySpark, Hadoop, and Kafka.
- Build, deploy, and optimize ETL workflows to process and transform large volumes of structured and unstructured data.
- Collaborate with cross-functional teams to understand requirements and implement solutions that meet business needs.
- Work with Apache NiFi for data ingestion, transformation, and flow management.
- Write and optimize complex SQL queries for data manipulation and reporting.
- Apply strong data structures and algorithms knowledge to solve complex technical problems.
- Automate tasks and processes using Shell Script and Linux-based tools.
- Participate in code reviews, design discussions.
- Ensure adherence to best practices in software development, testing, and deployment.
- Continuously improve software performance, scalability, and reliability.
- Stay up-to-date with the latest developments in data engineering, big data technologies and incorporate them into the team's practices.
Required Skills and Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- At least 5 years of professional software development experience, with strong expertise in the following:
- Python: Advanced proficiency in Python, including libraries like pandas, numpy, etc.
- PySpark: Experience with distributed data processing using PySpark.
- Hadoop: Familiarity with the Hadoop ecosystem, including HDFS, MapReduce, and related tools.
- Kafka: Hands-on experience in building and maintaining Kafka-based messaging systems.
- SQL: Strong knowledge of relational databases and advanced SQL querying.
- Data Structures & Algorithms: Strong understanding and practical application of data structures and algorithms.
- Data Engineering Best Practices: Deep understanding of data modeling, pipeline design, and data infrastructure architecture.
- ETL Pipelines: Expertise in designing, building, and maintaining efficient ETL pipelines.
- Apache NiFi: Knowledge of data flow management using Apache NiFi.
- Shell Scripting: Proficiency in writing efficient shell scripts for task automation.
- Linux: Strong knowledge of Linux systems and tools for development and deployment.
- Experience with Agile development methodologies.
- Excellent problem-solving skills and ability to troubleshoot complex technical issues.
- Strong communication skills with the ability to work in a collaborative team environment.
Preferred Qualifications
- Experience with platforms like Cloudera, Databricks.
- Familiarity with containerization technologies like Docker and Kubernetes.
- Knowledge of data warehousing and data lakes.