Job Title: Senior Data Engineer
What You Will Do
Let's do this. Let's change the world. In this vital role, you will be responsible for designing, developing, and optimizing data pipelines, data integration frameworks, and metadata-driven architectures that enable seamless data access and analytics. This role prefers deep expertise in big data processing, distributed computing, data modeling, and governance frameworks to support self-service analytics, AI-driven insights, and enterprise-wide data management.
Roles & Responsibilities
- Design, develop, and maintain scalable ETL/ELT pipelines to support structured, semi-structured, and unstructured data processing across the Enterprise Data Fabric.
- Implement real-time and batch data processing solutions, integrating data from multiple sources into a unified, governed data fabric architecture.
- Optimize big data processing frameworks using Apache Spark, Hadoop, or similar distributed computing technologies to ensure high availability and cost efficiency.
- Work with metadata management and data lineage tracking tools to enable enterprise-wide data discovery and governance.
- Ensure data security, compliance, and role-based access control (RBAC) across data environments.
- Optimize query performance, indexing strategies, partitioning, and caching for large-scale data sets.
- Develop CI/CD pipelines for automated data pipeline deployments, version control, and monitoring.
- Implement data virtualization techniques to provide seamless access to data across multiple storage systems.
- Collaborate with cross-functional teams, including data architects, business analysts, and DevOps teams, to align data engineering strategies with enterprise goals.
- Stay up to date with emerging data technologies and best practices, ensuring continuous improvement of Enterprise Data Fabric architectures.
What We Expect of You
We are all different, yet we all use our unique contributions to serve patients. The ideal professional we seek is a collaborative and analytical problem-solver with strong technical acumen and a passion for scalable data solutions.
Basic Qualifications
- Master's degree and 4 to 6 years of experience in Computer Science, IT, or related field
- OR
- Bachelor's degree and 6 to 8 years of experience in Computer Science, IT, or related field
Preferred Qualifications
- Hands-on experience with data engineering technologies such as Databricks, PySpark, SparkSQL, Apache Spark, AWS, Python, SQL, and Scaled Agile methodologies
- Proficiency in workflow orchestration and performance tuning on big data processing
- Strong understanding of AWS services
- Experience with Data Fabric, Data Mesh, or similar enterprise-wide data architectures
- Strong problem-solving and analytical skills
- Good communication and teamwork capabilities
- Experience with Scaled Agile Framework (SAFe), Agile delivery practices, and DevOps practices
- AWS Certified Data Engineer (preferred)
- Databricks Certificate (preferred)
- Scaled Agile SAFe certification (preferred)
Good-to-Have Skills
- Deep expertise in Biotech & Pharma industries
- Experience in writing APIs to make data available to consumers
- Experience with SQL/NoSQL databases and vector databases for large language models
- Strong data modeling and performance tuning skills for OLAP and OLTP systems
- Familiarity with software engineering best practices including version control (Git, Subversion), CI/CD (e.g., Jenkins, Maven), unit testing, and DevOps
Soft Skills
- Excellent analytical and troubleshooting skills
- Strong verbal and written communication abilities
- Ability to work effectively with global, virtual teams
- High degree of initiative and self-motivation
- Ability to manage multiple priorities successfully
- Team-oriented, with a focus on achieving shared goals
- Strong organizational skills and attention to detail
- Excellent presentation and public speaking skills