Key Responsibilities:
- Design & Build: Develop scalable, resilient ETL workflows and real-time data pipelines using Python, PySpark, and AWS services.
- Cloud Engineering: Utilize AWS services such as Glue, Lambda, EC2, RDS, and S3 to build efficient cloud-based data architectures.
- Snowflake Integration: Design and maintain data models, pipelines, and ingestion processes within Snowflake, ensuring performance and scalability.
- API Development: Integrate RESTful and other APIs to ingest and synchronize external and internal datasets.
- Optimization: Monitor and tune the performance of data workflows, pipelines, and queries for minimal latency and high throughput.
- Collaboration: Partner with data analysts, scientists, and stakeholders to define and deliver data requirements.
Required Skills & Qualifications:
- Experience Level: 4+ years in data engineering or related roles.
- Programming Expertise: Proficient in Python and PySpark, with a strong background in data transformation and API integration.
- AWS Mastery: Deep hands-on experience with AWS Glue, Lambda, EC2, RDS, and S3.
- ETL & Real-Time Pipelines: Proven ability to build and scale batch and streaming data workflows.
- Snowflake: Hands-on experience with Snowflake data warehouse, including data modeling, Snowpipe, and performance tuning.
- API Integration: Strong understanding of REST APIs, JSON, and secure authentication protocols.
- Data Systems: Proficiency in relational databases (e.g., PostgreSQL, MySQL) and best practices in data warehousing.
- Problem Solver: Strong analytical skills with a knack for identifying and solving complex data challenges.