Graph Meets Vector
Graph Meets Vector
Knowledge Arbitrage’s architecture for combining graph relationships with semantic search.
The Filesystem Problem
Traditional filesystems give you hierarchy. But knowledge doesn’t work that way — a document might belong to multiple projects, reference concepts across folders, and have relationships that evolve over time.
Graph databases model these relationships naturally. But pure graph traversal can’t answer “show me documents similar to this one.”
The Hybrid Approach
Knowledge Arbitrage combines two databases:
Neo4j stores the structure:
- Files as nodes with metadata
- Directories as container nodes
CONTAINSrelationships for hierarchy- Arbitrary relationships for cross-references
LanceDB stores the embeddings:
- Text chunked and embedded via Cohere
- Vector similarity search
- Hybrid queries combining keyword and semantic matching
How It Works
User query: "files about the AI project"
1. Query embedded via Cohere
2. LanceDB returns semantically similar chunks
3. Each chunk references a file path
4. Neo4j traverses from file nodes:
- Get file metadata
- Find parent directories
- Identify related files via cross-references
5. Return ranked results with context
Why This Matters
Pure vector search returns results without structure. Pure graph traversal finds structure without semantic understanding.
Together, they return results that are both semantically relevant AND contextually connected. You find the document about AI, and immediately see it’s related to your Q3 planning folder and the email thread with the team.
Trade-offs
This architecture isn’t simple:
- Two databases to maintain
- Synchronization between graph and vectors
- More complex queries
For applications where both relationship and similarity matter, the complexity pays off.