Job Description:
- Designing and Implementing Data Pipelines:
- Developing and maintaining scalable data pipelines using Databricks and related technologies like Azure Data Factory, Spark, and Python.
- ETL Processes:
- Creating and optimizing ETL (Extract, Transform, Load) workflows for efficient data extraction, transformation, and loading into Databricks.
- Data Modeling and Warehousing:
- Designing and implementing data models and data warehouses within the Databricks environment, including the Lakehouse Architecture.
- Data Quality and Governance:
- Ensuring data accuracy, consistency, and reliability through data quality checks, validation, and governance policies.
- Collaboration and Communication:
- Working with data scientists, analysts, and other stakeholders to understand requirements and deliver effective data solutions.
- Performance Optimization:
- Monitoring and optimizing data system performance, ensuring scalability, reliability, and cost-effectiveness within Databricks.
- Troubleshooting and Support:
- Identifying and resolving issues related to Databricks performance, data pipelines, and overall system functionality.
- Staying Updated:
- Keeping up-to-date with the latest Databricks features, best practices, and emerging technologies in the data engineering field.
- Required Skills and Experience:
- Databricks Proficiency:
- Strong hands-on experience with Databricks, including Delta Lake, clusters, notebooks, jobs, and workspaces.
- Programming Languages:
- Proficiency in Python and SQL, with experience in PySpark for Spark-based data processing.
- Data Engineering Concepts:
- Solid understanding of data warehousing concepts, data modeling, ETL processes, and data quality principles.
- Cloud Technologies:
- Experience with cloud platforms like Azure (Azure Data Factory, Azure Data Lake, Azure Synapse) or AWS.
- Data Integration:
- Experience integrating with various data sources, including databases, cloud storage (Azure Blob Storage, ADLS), and APIs.
- Agile Development:
- Experience working within Agile development methodologies and DevOps practices.
- Communication and Collaboration
- Excellent communication and collaboration skills to work effectively with diverse teams.
Skills:
Pyspark advanced, Python Basics and SQL and Databricks Advanced
Airflow