Key Responsibilities:
- Data Pipeline Development: Design and implement robust ETL/ELT pipelines using Python and SQL to extract, clean, and load large datasets from multiple sources (databases, APIs, S3, etc.)
- API Integration: Build and maintain API connections for ingestion of campaign and performance data from platforms like Meta, Branch, Google, and affiliate networks.
- Automation & Scheduling: Automate recurring data workflows using scheduling tools like Jenkins, Airflow, Cron, or similar.
- Data Cleaning & Transformation: Ensure data quality through validation, transformation, and enrichment processes.
- Cross-functional Collaboration: Work closely with product managers, analysts, and marketing teams to support their data needs and enable data-driven decision making.
- Dashboard Enablement: Collaborate with BI teams to feed clean data into dashboards (Power BI, Tableau, etc.)
Must Have Skills:
Python Programming
- Strong hands-on experience in Python for scripting, data processing, and API handling.
- Familiarity with libraries like pandas, requests, sqlalchemy, json, etc.
SQL & Databases
- Proficient in writing complex SQL queries.
- Experience working with databases like Redshift, PostgreSQL, or MySQL.
APIs
- Working knowledge of REST APIs and handling JSON/XML responses.
- Experience in API rate limits, pagination, and authentication.
Data Engineering Foundations
- Good understanding of ETL workflows and pipeline design.
- Exposure to tools like Airflow, Luigi, or similar schedulers.
Data Hygiene
- Familiarity with data validation, de-duplication, and data integrity checks
Good to Have Skills:
- Experience with cloud platforms like AWS (especially S3, Redshift, Lambda).
- Familiarity with version control (Git) and CI/CD workflows.
- Exposure to marketing data sources such as Google Ads, Facebook Ads, Branch.io, etc.
- Ability to debug and optimize slow-running SQL queries.
- Experience with message queues or event-based systems (Kafka, Pub/Sub) is a plus.