Databricks Engineer

Astellas Pharma

Bengaluru, India

5-7 Years

Save

Posted 4 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About Astellas

Astellas is a global life sciences company committed to turning innovative science into VALUE for patients. We provide transformative therapies in disease areas that include oncology, ophthalmology, urology, immunology and women's health. Through our research and development programs, we are pioneering new healthcare solutions for diseases with high unmet medical need. Learn more at Astellas.com (https://www.astellas.com/en) .

Are you driven to make a real difference in the lives of patients

We're seeking passionate individuals who thrive in dynamic environments, embrace new ideas, and aren't afraid to take intelligent risks. People who act with unwavering integrity and are deeply committed to making a tangible impact.

Purpose And Scope

The Databricks Engineer is responsible for building and enhancing the data processing pipelines and distributed compute workloads that run on the Databricks Platform. This role focuses on writing scalable PySpark and SQL code, designing efficient Delta Lake data flows, and implementing reliable job orchestration patterns that support high volume, production grade data operations. You will work directly within Databricks notebooks and workflows to build ingestion and transformation logic, optimize cluster usage, and ensure pipelines meet performance, reliability, and cost expectations.

This position works closely with Data Engineers, Platform Engineering, and Data Science teams to translate technical requirements into well-structured data pipelines and automated jobs. The role involves debugging distributed compute issues, tuning Spark performance, enforcing coding and data quality standards, and integrating pipelines with CI/CD and monitoring tools. Your work ensures that downstream analytics, ML models, and business applications have access to accurate, timely, and well-organized data across the Astellas Data Platform.

Responsibilities And Accountabilities

Develop & Maintain Scalable Data Pipelines: Develop and maintain ETL/ELT pipelines using PySpark, Spark SQL, Auto Loader, and Delta Live Tables to support data ingestion and transformation needs.
Implement Robust Lakehouse Architecture: Implement and enhance Medallion (Bronze/Silver/Gold) layers by applying Delta Lake features such as schema evolution, and optimization techniques.
Integrate Data Across Cloud Platforms: Ingest and harmonize structured, semi structured, and unstructured data from multiple cloud environments including Azure, AWS, and enterprise object storage.
Develop Reusable Engineering Frameworks: Create and maintain reusable Python, PySpark, and YAML based libraries and patterns to standardize ingestion, transformation, automation, and engineering workflows across teams.
Implement Data Quality & Governance: Implement data validation checks and follow Unity Catalog governance standards for access control, lineage, external locations, and PII/PHI controls.
CI/CD & Deployment Automation: Utilize Azure DevOps and Databricks Asset Bundles (DABs) to establish automated build, test, and deployment workflows; ensure source control discipline and promote engineering best practices.
Optimize Performance & Cost Efficiency: Apply standard Spark performance techniques such as partitioning and query optimization to improve reliability and efficiency of data workloads.
Collaborate with Data & Platform Teams: Work closely with Business, Analysts, SMEs, and Platform Engineering teams to translate requirements into scalable data solutions.
Participate in Technical Reviews and Knowledge Sharing: Contribute to design discussions, share learnings with peers, and seek guidance from senior engineers to continuously improve engineering practices.

Required Qualifications

Bachelor's degree in computer science, Engineering, or related discipline, or equivalent experience.
5+ years of Data Engineering experience, including 3+ years working on Databricks.
Proven experience designing enterprise-scale data architectures and distributed systems
Expertise in Delta Lake internals (file pruning, compaction, metadata management, CDF tuning).
Experience working in complex migrations (legacy ETL, cloud migrations, warehouse consolidation).
Experience developing reusable engineering frameworks, libraries, and standards.
Strong proficiency in Python, SQL, and PySpark for building scalable data pipelines.
Experience with cloud platforms such as Azure, AWS, or GCP, including working with object storage.
Hands-on experience with warehouse/Lakehouse technologies, including Synapse, Snowflake, or Redshift.
Knowledge of traditional ETL tools, such as Informatica, Talend, or equivalent.
Proficiency with Git-based version control and DevOps tooling (Azure DevOps, GitHub, Bitbucket).
Experience with Databricks Workflows and orchestration tools for automated data processing.

Preferred Qualifications

Experience building batch and streaming hybrid architectures (CDC, Auto Loader sequencing, DLT pipelines)
Experience automating DevOps using DABs across multi-workspace deployments
Experience with Delta Live Tables (DLT), Auto Loader, and streaming or hybrid (batch + streaming) architectures—including CDC, event sequencing, and incremental processing.
Hands-on expertise with Unity Catalog governance, including lineage, ABAC/RBAC access controls, external locations, and secure data sharing patterns.
Experience working in regulated industries such as pharmaceuticals, healthcare, or life sciences.
Proficiency with MLflow and MLOps lifecycle management, including model tracking, registry operations, and production deployment workflows.
Demonstrated ability to build reusable shared libraries, engineering frameworks, and standardized patterns for enterprise-scale data platforms.
Databricks Certified Data Engineer Associate certification (strongly preferred).
Experience with serverless compute models, Photon runtime, and Delta Sharing for cross-domain or cross-organization data exchange.
Familiarity with data mesh or domain-oriented data product architectures supporting federated ownership and self-service data capabilities.
Hands-on experience automating DevOps workflows using Databricks Asset Bundles (DABs) across multi-workspace or multi-environment deployments

Working Environment

At Astellas we recognize the importance of work/life balance, and we are proud to offer a hybrid working solution allowing time to connect with colleagues at the office with the flexibility to also work from home. We believe this will optimize the most productive work environment for all employees to succeed and deliver. Hybrid work from certain locations may be permitted in accordance with Astellas Responsible Flexibility

What awaits you at Astellas

Global collaboration: Become part of a connected global business of like-minded life science leaders, all dedicated to improving patients lives worldwide.
Real-world patient impact: Contribute to transformative therapies that reach patients around the world, knowing your work makes a difference every day.
Relentless Innovation: Join a company at the forefront of scientific breakthroughs, where you'll have the opportunity to shape the future of healthcare.
A Culture of Growth: Chart your own course within a supportive environment that values your contributions, champions your development, and empowers you to pursue your passions.

Our Organizational Values and Behaviors

Values: Innovation, Integrity and Impact sit at the heart of what we do.

Behaviors: We come together as One Astellas, working with courage and a sense of urgency. We are outcome focused and consistently take accountability for our personal contribution.

Category PlatformX (SUB00000710)

Astellas is committed to equality of opportunity in all aspects of employment.

EOE including Disability/Protected Veterans