Search by job, company or skills

Qburst

Senior/Lead Engineer - Data Engineering/Databricks

new job description bg glownew job description bg glownew job description bg svg
  • Posted 18 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Description:

We are looking for an experienced Data Engineer with proven expertise in building and optimizing ETL pipelines on Databricks, leveraging Delta Lake and Spark SQL. The ideal candidate will have a strong foundation in Python and SQL, a solid understanding of data storage formats such as Parquet and Delta, and experience in performance optimization, testing, and automated workflows.


Responsibilities:
  • ETL Development: Design and implement well-structured Databricks notebooks for ETL workflows, following best practices
  • Data Storage: Utilize Delta Lake for data storage, demonstrating understanding of its benefits such as ACID transactions, schema enforcement, and time travel
  • Data Transformation: Apply Spark SQL for complex data transformations and aggregations
  • Delta Live Tables (DLT): Design and manage declarative, incremental pipelines on top of Delta Lake using Delta Live Tables. Leverage built-in orchestration, dependency management, and data quality checks for reliable ETL workflows
  • File formats: Add explicit mention of Parquet, ORC, Avro, and JSON to ensure versatility in handling different formats
  • Delta Sharing: Configure and manage Delta Sharing for secure, governed data distribution, integrating with Unity Catalog for access control, auditing, and automation as part of the data delivery process
  • Data Governance: Leverage Unity Catalog for data lineage, tagging, and access control, enhancing data discoverability and ensuring compliance
  • Error Handling & Validation: Implement proper exception handling, logging, and data validation checks to ensure data quality
  • Automation: Develop automated triggers and job orchestration for pipeline execution
  • Documentation: Maintain a comprehensive documentation explaining the project, dependencies, execution steps, and recommendations to stakeholders
  • Test cases & Validation: Develop and maintain test cases to validate data transformations, schema consistency, and business rules, ensuring data accuracy and reliability across all pipeline stages
  • Performance Optimization: Optimize ETL processes for scalability and reduced processing time
  • Collaboration: Work closely with business analysts, data scientists, and stakeholders to deliver actionable insights
  • Security best practices: Knowledge of encryption, masking, role-based access control in Databricks & cloud storage

Requirements
  • 5+ years in Data Engineering, with strong expertise in Databricks, PySpark, and Python.
  • Dynamic, Self-motivated engineer with extensive logical reasoning and problem solving skills
  • Strong experience in Python and SQL, with extensive debugging skills
  • Version control & DevOps: Git/GitHub/GitLab for versioning, integration with CI/CD
  • Hands-on experience with Databricks and Delta Lake
  • Solid understanding of Spark SQL and distributed computing concepts
  • Experience in ETL design, data modeling, and pipeline automation
  • Knowledge of error handling, logging, and data validation techniques
  • Experience with unit testing and integration testing in data pipelines
  • Proven track record in performance tuning of large-scale data processing jobs
  • Strong problem-solving and analytical skills
  • Excellent written and verbal communication skills
Preferred Skills:

  • Experience with cloud platforms (Azure, AWS, or GCP) in a data engineering context.
  • Knowledge of data governance and compliance best practices.

More Info

Job Type:
Employment Type:

About Company

QBurst is a full-service software solutions provider that works with clients to maximize the effectiveness of their business through the adoption of digital technology.

Job ID: 143679835