This role is for a Data Engineer with extensive experience in data engineering, data warehousing, and big data processing. The ideal candidate will have strong expertise in Python, PySpark, and AWS data services to design, develop, and maintain robust data pipelines.
Key Responsibilities
- Design and implement end-to-end data engineering solutions using a combination of Python and AWS data services like Glue, Lambda, and EMR.
- Develop and maintain scalable ETL (Extract, Transform, Load) pipelines.
- Utilize PySpark for efficient big data processing and transformation.
- Work with Kafka for real-time data ingestion and stream processing.
- Optimize and fine-tune data pipelines for performance, scalability, and cost-efficiency on the AWS platform.
- Implement data security measures and contribute to data governance practices.
- Collaborate with cross-functional teams to understand data requirements and translate them into scalable data architectures.
- Troubleshoot and resolve complex data workflow issues.
Skills
Required Skills:
- Strong expertise in Python and SQL for data manipulation and pipeline development.
- Hands-on experience with PySpark for big data processing.
- Deep understanding of AWS data services such as Redshift, Glue, S3, Lambda, Kinesis, Athena, and DynamoDB.
- Experience with Kafka for real-time data ingestion.
- Strong experience in data modeling, schema design, and ETL best practices.
- Excellent problem-solving skills with the ability to debug complex data workflows.
Preferred Skills:
- Familiarity with infrastructure as code (IaC) tools like Terraform or CloudFormation.
Qualifications
- A minimum of 5 years of experience in data engineering, data warehousing, and big data processing.