Job description
Primary Tech skills:
- Advanced Web-crawling & scraping methods and tools
- Building end-end Data Engineering pipelines for Semi and unstructured data (Text, all kinds of simple/complex table structures, images, video and audio data)
- Python, Pyspark, SQL, RDBMS
- Data Transformation (ETL/ELT) activities
- SQL Data warehouse (e.g. Snowflake) working / preferably administration
Secondary Tech skills:
- Databricks
- Familiarity with AWS services : S3, Glue, EMR, EC2, RDS, monitoring and IAM
- Kafka, Spark & Kafka Streaming
- Workflow automation (e.g. using Github actions)
- Performing RCA
Responsibilities:
- Develop, maintain, and optimize data pipelines and workflows and Feature Store to ensure seamless data ingestion and transformation as a scalable data solution.
- Design, develop, implement, and architect Data Engineering pipelines, considering performance & scalability including data storage and processing.
- Implement advanced data transformations and quality checks to ensure data accuracy, completeness, security and consistency of data.
- Seamlessly integrate data from diverse sources, for data ingestion, transformation and storage, leveraging AWS S3 Storage and possibly Snowflake as a SQL Data Warehouse.
- Create and implement advanced data models and schemas and ensure data governance and data management best practices.
Qualification and Desired Experiences:
- 7+ years of data analysis and engineering experience
- Bachelors degree in computer science, Statistics, Informatics, Information Systems or another quantitative field.
- Working knowledge of API or Stream-based data extraction processes like Salesforce API and Bulk API and have hands-on experience in web crawling.
Personal Skills:
- Ability to collaborate cross-functionally and build sound working relationships within all levels of the organization
- Ability to handle sensitive information with keen attention to detail and accuracy. Passion for data handling ethics.
- Effective time management skills and ability to solve complex technical problems with creative solutions while anticipating stakeholder needs and helping meet or exceed expectations
- Comfortable with ambiguity and uncertainty of change when assessing needs for stakeholders
- Self-motivated and innovative; confident when working independently, but an excellent team player with a growth-oriented personality