Role Description:
We are seeking a Senior Data Engineer with expertise in Graph Data technologies to join our data engineering team and contribute to the development of scalable, high-performance data pipelines and advanced data models that power next-generation applications and analytics. This role combines core data engineering skills with specialized knowledge in graph data structures, graph databases, and relationship-centric data modeling, enabling the organization to leverage connected data for deep insights, pattern detection, and advanced analytics use cases. The ideal candidate will have a strong background in data architecture, big data processing, and Graph technologies and will work closely with data scientists, analysts, architects, and business stakeholders to design and deliver graph-based data engineering solutions.
Roles & Responsibilities:
- Design, build, and maintain robust data pipelines using Databricks (Spark, Delta Lake, PySpark) for complex graph data processing workflows.
- Own the implementation of graph-based data models, capturing complex relationships and hierarchies across domains.
- Build and optimize Graph Databases such as Stardog, Neo4j, Marklogic or similar to support query performance, scalability, and reliability.
- Implement graph query logic using SPARQL, Cypher, Gremlin, or GSQL, depending on platform requirements.
- Collaborate with data architects to integrate graph data with existing data lakes, warehouses, and lakehouse architectures.
- Work closely with data scientists and analysts to enable graph analytics, link analysis, recommendation systems, and fraud detection use cases.
- Develop metadata-driven pipelines and lineage tracking for graph and relational data processing.
- Ensure data quality, governance, and security standards are met across all graph data initiatives.
- Mentor junior engineers and contribute to data engineering best practices, especially around graph-centric patterns and technologies.
- Stay up to date with the latest developments in graph technology, graph ML, and network analytics.
What we expect of you
Must-Have Skills:
- Hands-on experience in Databricks, including PySpark, Delta Lake, and notebook-based development.
- Hands-on experience with graph database platforms such as Stardog, Neo4j, Marklogic etc.
- Strong understanding of graph theory, graph modeling, and traversal algorithms
- Proficiency in workflow orchestration, performance tuning on big data processing
- Strong understanding of AWS services
- Ability to quickly learn, adapt and apply new technologies with strong problem-solving and analytical skills
- Excellent collaboration and communication skills, with experience working with Scaled Agile Framework (SAFe), Agile delivery practices, and DevOps practices.
Good-to-Have Skills:
- Good to have deep expertise in Biotech & Pharma industries
- Experience in writing APIs to make the data available to the consumers
- Experienced with SQL/NOSQL database, vector database for large language models
- Experienced with data modeling and performance tuning for both OLAP and OLTP databases
- Experienced with software engineering best-practices, including but not limited to version control (Git, Subversion, etc.), CI/CD (Jenkins, Maven etc.), automated unit testing, and Dev Ops
Education and Professional Certifications
- Masters degree and 3 to 4 + years of Computer Science, IT or related field experience
- Bachelors degree and 5 to 8 + years of Computer Science, IT or related field experience
- AWS Certified Data Engineer preferred
- Databricks Certificate preferred
- Scaled Agile SAFe certification preferred
Soft Skills:
- Excellent analytical and troubleshooting skills.
- Strong verbal and written communication skills
- Ability to work effectively with global, virtual teams
- High degree of initiative and self-motivation.
- Ability to manage multiple priorities successfully.
- Team-oriented, with a focus on achieving team goals.
- Ability to learn quickly, be organized and detail oriented.
- Strong presentation and public speaking skills.