Key Responsibilities:
- Serve as the subject matter expert for Databricks architecture, development, and operational best practices.
- Design, develop, and manage ETL/ELT pipelines using Python (PySpark) in Databricks.
- Leverage Unity Catalog for data lineage, security, and governance management.
- Implement and maintain CI/CD pipelines for Databricks deployments using Git and DevOps tools.
- Build and optimize scalable data architectures, including Data Lakes, Lakehouses, and Data Warehouses.
- Configure and optimize Databricks clusters, jobs, and workflows for both batch and streaming data processing.
- Monitor and tune Databricks workloads to ensure high performance and scalability.
- Collaborate with cross-functional teams to implement data governance and compliance practices.
- Provide mentorship and guidance to junior engineers on best practices and standards.
- Maintain technical documentation and provide training on Databricks tools and processes.
- Stay updated on latest Databricks features and introduce innovative solutions to improve data engineering workflows.
Required Skills & Experience:
- 5+ years of experience in data engineering with hands-on expertise in Databricks and Apache Spark.
- Proficient in Unity Catalog for data lineage, security, and governance.
- Experience building and optimizing ETL pipelines using Azure Data Factory, Informatica, or similar tools.
- Strong understanding of CI/CD practices and version control with Git integrated with Databricks.
- Expertise in SQL development and performance tuning for large-scale datasets.
- Knowledge of Azure ecosystem, including Azure Data Lake and Azure Storage.
- Experience with both batch and streaming data pipelines.
- Familiarity with data modeling and dimensional design (e.g., star schema).
- Understanding of data governance, compliance, and security best practices.
- Excellent communication, problem-solving, and multitasking skills.