Principal Databricks Developer

Astellas Pharma

Bengaluru, India

12-14 Years

Save

Posted 15 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About Astellas

Astellas is a global life sciences company committed to turning innovative science into VALUE for patients. We provide transformative therapies in disease areas that include oncology, ophthalmology, urology, immunology and women's health. Through our research and development programs, we are pioneering new healthcare solutions for diseases with high unmet medical need. Learn more at Astellas.com (https://www.astellas.com/en) .

Are you driven to make a real difference in the lives of patients

We're seeking passionate individuals who thrive in dynamic environments, embrace new ideas, and aren't afraid to take intelligent risks. People who act with unwavering integrity and are deeply committed to making a tangible impact.

Purpose And Scope

The Principal Databricks Developer serves as a senior technical leader responsible for shaping how complex data applications, pipelines, and distributed processing workloads are built and operated on the Databricks Platform. This role combines deep hands‑on engineering expertise with architectural thinking, ensuring that our data solutions are not only well‑implemented, but also scalable, maintainable, and aligned with the long‑term direction of our data ecosystem. You will design advanced PySpark and SQL frameworks, lead the development of high‑performance data flows, and guide the team in applying best practices around Delta Lake, job orchestration, and cloud integration.

As a principal contributor, you will take ownership of the most challenging technical initiatives—solving problems at scale, driving improvements in performance and reliability, and setting the standards that others follow. You will partner closely with Data Engineering, Platform Engineering, Data Science, Architecture, and other technical teams to evaluate solution designs, influence engineering decisions, and mentor developers in modern data engineering practices.

Responsibilities And Accountabilities

Develop & Maintain Scalable Data Pipelines: Architect, build, and optimize ETL/ELT pipelines using PySpark, Spark SQL, Auto Loader, and Delta Live Tables to support end to end ingestion and transformation.
Implement Robust Lakehouse Architecture: Design and enhance Medallion Layers (Bronze/Silver/Gold) data models, applying Delta Lake features such as schema evolution, CDF, Optimize, and Z Ordering to deliver performant, reliable, and cost efficient data layers.
Integrate Data Across Cloud Platforms : Ingest and harmonize structured, semi structured, and unstructured data from multiple cloud environments including Azure, AWS, GCP, and enterprise object storage.
Develop Reusable Engineering Frameworks: Create and maintain reusable Python, PySpark, and YAML based libraries and patterns to standardize ingestion, transformation, automation, and engineering workflows across teams.
Enforce Data Quality & Governance: Implement and operationalize automated data validation frameworks (DLT expectations, data contracts) while applying Unity Catalog governance covering permissions, lineage, external locations, and PII/PHI controls.
Lead CI/CD & Deployment Automation: Lead Azure DevOps and Databricks Asset Bundles (DABs) to establish automated build, test, and deployment workflows; ensure source control discipline and promote engineering best practices.
Optimize Performance & Cost Efficiency: Tune Spark workloads by applying partitioning, caching, and join optimization strategies; leverage Photon, serverless SQL, and cluster right sizing to improve runtime performance and reduce compute costs.
Collaborate with Data & Platform Teams: Partner closely with Data Scientists, Analysts, SMEs, and Platform Engineering teams to translate requirements into scalable data solutions and align on architectural, governance, and operational standards.
Operationalize Data Science Workflows: Convert prototype notebooks into production ready pipelines, support feature engineering and batch/real time scoring and manage MLflow tracking and model registry operations.
Develop Lightweight Analytical Applications: Build small-scale applications using Streamlit, Shiny, or Gradio to support internal stakeholders with interactive data products and insights.
Provide Technical Leadership & Mentorship: Lead design reviews, guide junior/mid-level engineers, and champion best practices to elevate engineering quality and technical execution across the organization.
Lead and manage a team of Databricks engineers , providing direction, prioritization, and delivery oversight across sprint commitments and platform objectives.
Coach, mentor, and develop engineers through regular 1:1s, capability plans, technical guidance, and structured feedback to build a high-performing engineering culture.
Lead cross-team alignment on Databricks standards (CI/CD, governance, data quality, and operational readiness), ensuring consistent adoption across domains and delivery teams.

Required Qualifications

Bachelor's degree in computer science, Engineering, or related discipline, or equivalent experience.
12+ years of Data Engineering experience, including 5+ years working on Databricks.
Proven experience designing enterprise-scale data architectures and distributed systems
Deep expertise in Delta Lake internals (file pruning, compaction, metadata management, and CDF tuning).
Experience leading complex migrations (legacy ETL, cloud migrations, warehouse consolidation).
Experience developing reusable engineering frameworks, libraries, and standards.
Strong proficiency in Python, SQL, and PySpark for building scalable data pipelines.
Experience with cloud platforms such as Azure, AWS, or GCP, including working with object storage.
Hands-on experience with warehouse/Lakehouse technologies, including Synapse, Snowflake, or Redshift.
Knowledge of traditional ETL tools, such as Informatica, Talend, or equivalent.
Proficiency with Git-based version control and DevOps tooling (Azure DevOps, GitHub, Bitbucket).
Experience with Databricks Workflows and orchestration tools for automated data processing.

Preferred Qualifications

Experience building batch and streaming hybrid architectures (CDC, Auto Loader sequencing, DLT pipelines)
Experience automating DevOps using DABs across multi-workspace deployments
Experience with Delta Live Tables (DLT), Auto Loader, and streaming or hybrid (batch + streaming) architectures—including CDC, event sequencing, and incremental processing.
Hands-on expertise with Unity Catalog governance, including lineage, ABAC/RBAC access controls, external locations, and secure data sharing patterns.
Experience working in regulated industries such as pharmaceuticals, healthcare, or life sciences.
Proficiency with MLflow and MLOps lifecycle management, including model tracking, registry operations, and production deployment workflows.
Demonstrated ability to build reusable shared libraries, engineering frameworks, and standardized patterns for enterprise-scale data platforms.
Databricks Certified Data Engineer Professional certification (strongly preferred).
Experience with serverless compute models, Photon runtime, and Delta Sharing for cross-domain or cross-organization data exchange.
Familiarity with data mesh or domain-oriented data product architectures supporting federated ownership and self-service data capabilities.
Experience implementing or configuring data observability tooling to monitor quality, lineage, and pipeline health.
Hands-on experience automating DevOps workflows using Databricks Asset Bundles (DABs) across multi-workspace or multi-environment deployments.

Working Environment

At Astellas we recognize the importance of work/life balance, and we are proud to offer a hybrid working solution allowing time to connect with colleagues at the office with the flexibility to also work from home. We believe this will optimize the most productive work environment for all employees to succeed and deliver. Hybrid work from certain locations may be permitted in accordance with Astellas Responsible Flexibility

What awaits you at Astellas

Global collaboration: Become part of a connected global business of like-minded life science leaders, all dedicated to improving patients lives worldwide.
Real-world patient impact: Contribute to transformative therapies that reach patients around the world, knowing your work makes a difference every day.
Relentless Innovation: Join a company at the forefront of scientific breakthroughs, where you'll have the opportunity to shape the future of healthcare.
A Culture of Growth: Chart your own course within a supportive environment that values your contributions, champions your development, and empowers you to pursue your passions.

Our Organizational Values and Behaviors

Values: Innovation, Integrity and Impact sit at the heart of what we do.

Behaviors: We come together as One Astellas, working with courage and a sense of urgency. We are outcome focused and consistently take accountability for our personal contribution.

Category EnablementX (SUB00000713)

Astellas is committed to equality of opportunity in all aspects of employment.

EOE including Disability/Protected Veterans