Job Description
Position Summary
The Technical Lead Databricks Lakehouse Migration & Integration will play a pivotal role in enabling our pharmaceutical and life sciences clients to modernize their data platforms. This role involves leading the migration of complex clinical, regulatory, and commercial datasets into the Databricks Lakehouse environment, ensuring compliance with industry standards such as GxP, HIPAA, and GDPR. The incumbent will provide technical leadership, oversee integration strategies, and deliver scalable, secure, and compliant data solutions that support commercial operations, drug development and clinical trials.
Job Responsibilities
Migration & Integration
Lead the migration of legacy data warehouses and research data lakes into the Databricks Lakehouse platform.Architect and implement Delta Lakehouse solutions tailored to life sciences data domainsIntegrate Databricks with upstream laboratory systems, electronic data capture (EDC), ERP, CRM, and downstream analytics/BI platforms
Compliance & Governance
Ensure adherence to GxP, HIPAA, GDPR, and other regulatory requirements in data architecture and workflows.Implement robust governance, metadata management, and security frameworks using Databricks Unity Catalog.
Leadership & Collaboration
Provide technical leadership and mentorship to data engineers and consultants.Collaborate with client stakeholders including R&D, clinical operations, regulatory affairs, and commercial teams.Translate business requirements into compliant, scalable technical solutions.Optimization & InnovationOptimize Spark jobs, Delta tables, and streaming pipelines for large-scale clinical and commercial datasetsPromote automation, CI/CD, and DevOps practices in data engineering projectsStay current with Databricks advancements and emerging technologies in pharma data management.
Education
BE/B.Tech
Master of Computer Application
Work Experience
Bachelor's or Master's degree in Computer Science, Information Technology, Bioinformatics, or related field.
7+ years of experience in data engineering, with at least 23 years in a technical lead role.
Strong hands-on expertise with Databricks (Spark, SQL, MLflow, Delta Lake).
Deep understanding of Delta Lakehouse architecture and its application to regulated data environments.
Proficiency in PySpark, SQL, and Python for distributed data processing.
Experience with cloud platforms (Azure, AWS, or GCP) and their data services.
Familiarity with ETL/ELT frameworks, data pipelines, and real-time streaming technologies (e.g., Kafka).
Demonstrated experience working with life sciences/pharma datasets (clinical trials, regulatory submissions, pharmacovigilance, commercial analytics).
Behavioural Competencies
Teamwork & Leadership
Motivation to Learn and Grow
Ownership
Cultural Fit
Talent Management
Technical Competencies
Problem Solving
Lifescience Knowledge
Communication
Project Management
Databricks
PySpark
Delivery Management- BIM/ Cloud Info Management