Responsibilities:
Build Data Pipelines:
- Utilize PySpark and Python to construct efficient and scalable data pipelines.
- Integrate data from multiple source systems into a unified target system.
Orchestrate Pipelines with Airflow:
- Use Apache Airflow to orchestrate and schedule data pipelines, ensuring timely and reliable execution.
Enhance Existing Pipelines:
- Understand existing data pipelines and make enhancements based on evolving business requirements.
- Implement improvements to optimize performance and maintainability.
Debugging and Root Cause Analysis:
- Troubleshoot and resolve any failures in data pipelines promptly.
- Conduct root cause analysis for pipeline failures and implement corrective measures.
Collaboration with Stakeholders:
- Work closely with various stakeholders, both within and across teams.
- Communicate effectively to understand and address business needs related to data processing.
Weekend and Shift Support:
- Be available to work on weekends and in shifts if necessary to provide support for business operations.
Requirements
Experience:
- 3-5 years of experience as a data engineer, demonstrating a solid understanding of data engineering principles.
Technical Skills:
- Proficient in SQL, Python, and PySpark for designing and implementing data solutions.
- Knowledge of data warehousing techniques and dimensional modeling.
Orchestration Tools:
- Experience with Apache Airflow for orchestrating complex data workflows.
- Familiarity with containerization using Docker and version control systems.
Data Modelling and Transformation:
- Exhibit strong proficiency in data modeling techniques, emphasizing expertise in designing and implementing effective data structures.
- Knowledge of DBT (Data Build Tool) for transforming and modeling data.
Cloud Platform:
- AWS knowledge is a plus, showcasing familiarity with cloud-based data services and infrastructure.