Senior Data Engineer (Ontology & Knowledge Graph Systems)
We are seeking a Senior Data Engineer to design and implement a semantic data layer that unifies complex data assets across domains. This role focuses on building Palantir-style ontology and knowledge graph systems using open-source technologies to enable interpretability, analytics, and AI-driven workflows.
Key Responsibilities
- Design and implement scalable ontology or knowledge graph models representing business and technical domains.
- Build and maintain data pipelines (batch and streaming) for ingesting, transforming, and mapping heterogeneous data into semantic structures.
- Develop and optimize storage and query layers using graph databases or RDF/OWL frameworks (e.g., Neo4j, Apache Jena, TerminusDB).
- Integrate with APIs and orchestration systems to operationalize actions and workflows on ontology objects.
- Implement and maintain SPARQL, Cypher, or GraphQL interfaces for downstream applications.
- Collaborate with data scientists and AI teams to expose graph-based features for modeling and analytics.
- Ensure data lineage, versioning, and governance of ontology schemas and transformations.
- Establish telemetry, metrics, and automated tests for data quality and consistency.
- Mentor other engineers on semantic modeling, data integration patterns, and graph-based system design.
Required Skills and Experience
- 6+ years of experience in data engineering, with strong background in distributed data systems.
- Expertise in data modeling, ontology design (RDF/OWL), and graph data structures.
- Proficiency with graph databases (Neo4j, TerminusDB, ArangoDB) and query languages (SPARQL, Cypher, GraphQL).
- Hands-on experience with Apache Spark or similar distributed data processing frameworks.
- Strong understanding of ETL/ELT workflows and data integration across multiple systems.
- Proficiency in Python, Scala, or Java.
- Experience designing and managing APIs, preferably GraphQL-based data access layers.
- Familiarity with workflow orchestration tools (Airflow, Temporal, Camunda) and CI/CD pipelines.
- Strong knowledge of data governance, schema evolution, and version control for ontology/data models.
- Excellent communication and documentation skills for working with cross-functional teams.
Preferred Qualifications
- Experience with semantic web technologies (RDF, OWL, SHACL).
- Background in AI/ML pipelines leveraging graph or semantic data.
- Understanding of reasoning and inference systems.
- Experience with cloud-based data platforms (AWS, GCP, or Azure).
Impact
This role will define the foundation of our semantic data architecture creating a unified, interpretable data layer that powers decision intelligence, analytics, and AI-driven systems across the organization.