Role Description:
Let's do this. Let's change the world. We are looking for a highly motivated expert Data Engineer who can own the design & development of complex data pipelines, solutions, and frameworks. The ideal candidate will be responsible to design, develop, and maintain data pipelines, data integration frameworks, and metadata-driven architectures that enable seamless data access and analytics. This role prefers deep expertise in big data processing, distributed computing, data modeling, and governance frameworks to support self-service analytics, AI-driven insights, and enterprise-wide data management.
Roles & Responsibilities:
- Design, develop, and maintain complex ETL/ELT data pipelines in Databricks using PySpark, Scala, and SQL to process large-scale datasets
- Understand the biotech/pharma or related domains & build highly efficient data pipelines to migrate and deploy complex data across systems
- Design and implement solutions to enable unified data access, governance, and interoperability across hybrid cloud environments
- Ingest and transform structured and unstructured data from databases (PostgreSQL, MySQL, SQL Server, MongoDB, etc.), APIs, logs, event streams, images, PDFs, and third-party platforms
- Ensure data integrity, accuracy, and consistency through rigorous quality checks and monitoring
- Expert in data quality, data validation, and verification frameworks
- Innovate, explore, and implement new tools and technologies to enhance efficient data processing
- Proactively identify and implement opportunities to automate tasks and develop reusable frameworks
- Work in an Agile and Scaled Agile (SAFe) environment, collaborating with cross-functional teams, product owners, and Scrum Masters
- Use JIRA, Confluence, and Agile DevOps tools to manage sprints, backlogs, and user stories
- Support continuous improvement, test automation, and DevOps practices in the data engineering lifecycle
- Collaborate and communicate effectively with product and cross-functional teams to understand business requirements and translate them into technical solutions
Must-Have Skills:
- Hands-on experience in Databricks, PySpark, SparkSQL, Apache Spark, AWS, Python, SQL, and Scaled Agile methodologies
- Proficiency in workflow orchestration and performance tuning on big data processing
- Strong understanding of AWS services
- Ability to quickly learn, adapt, and apply new technologies
- Strong problem-solving and analytical skills
- Excellent communication and teamwork skills
- Experience with SAFe, Agile delivery, and DevOps practices
Good-to-Have Skills:
- Data engineering experience in Biotechnology or Pharma
- Experience in writing APIs to expose data to consumers
- Experience with SQL/NoSQL databases, vector databases for LLMs
- Experience with data modeling and performance tuning for OLAP and OLTP
- Knowledge of software engineering best practices: version control (Git, Subversion), CI/CD (Jenkins, Maven), automated testing, and DevOps
Education and Professional Certifications:
- Minimum 5 to 8 years of experience in Computer Science, IT, or related field
- AWS Certified Data Engineer (preferred)
- Databricks Certification (preferred)
- SAFe certification (preferred)
Soft Skills:
- Excellent analytical and troubleshooting skills
- Strong verbal and written communication skills
- Ability to work effectively with global, virtual teams
- High degree of initiative and self-motivation
- Ability to manage multiple priorities
- Team-oriented, with a focus on achieving team goals
- Ability to learn quickly, be organized, and detail-oriented
- Strong presentation and public speaking skills