- 6+ years of experience in data engineering or big data technologies.
- Strong hands-on experience with:
- Azure Databricks
- Azure Data Factory (ADF)
- Apache Spark (PySpark/Scala)
- Good understanding of distributed computing and data processing models.
- Experience with Azure Data Lake Storage (ADLS Gen2).
- Strong SQL skills and experience with data modeling.
- Experience in building scalable ETL/ELT pipelines.
- Knowledge of data partitioning, performance tuning, and optimization techniques.
- Familiarity with CI/CD pipelines (Azure DevOps).
Preferred Skills
- Experience with Delta Lake and Lakehouse architecture.
- Knowledge of streaming data processing (Structured Streaming, Kafka, Event Hub).
- Experience with Power BI or other visualization tools.
- Familiarity with Python, Scala, or SQL-based programming.
- Understanding of data governance and security frameworks.
Soft Skills
- Strong analytical and problem-solving skills.
- Good communication and stakeholder management abilities.
Ability to work independently and in a collaborative team environment.
Preferred Qualifications:
- BS degree in Computer Science or Engineering or equivalent experience
Roles & Responsibilities- Design, develop, and optimize data pipelines using Azure Databricks and Azure Data Factory (ADF).
- Build and manage large-scale distributed data processing solutions using Apache Spark.
- Develop ETL/ELT workflows for structured and unstructured data.
- Implement and manage data ingestion, transformation, and orchestration pipelines.
- Work with data lakes (Azure Data Lake Storage Gen2) and data warehousing solutions.
- Optimize performance of Spark jobs and Databricks clusters.
- Collaborate with data architects and stakeholders to design scalable data solutions.
- Ensure data quality, governance, and security best practices.
- Troubleshoot and resolve data pipeline issues.