Graph Meets Vector

Knowledge Arbitrage’s architecture for combining graph relationships with semantic search.

The Filesystem Problem

Traditional filesystems give you hierarchy. But knowledge doesn’t work that way — a document might belong to multiple projects, reference concepts across folders, and have relationships that evolve over time.

Graph databases model these relationships naturally. But pure graph traversal can’t answer “show me documents similar to this one.”

The Hybrid Approach

Knowledge Arbitrage combines two databases:

Neo4j stores the structure:

  • Files as nodes with metadata
  • Directories as container nodes
  • CONTAINS relationships for hierarchy
  • Arbitrary relationships for cross-references

LanceDB stores the embeddings:

  • Text chunked and embedded via Cohere
  • Vector similarity search
  • Hybrid queries combining keyword and semantic matching

How It Works

User query: "files about the AI project"

1. Query embedded via Cohere
2. LanceDB returns semantically similar chunks
3. Each chunk references a file path
4. Neo4j traverses from file nodes:
   - Get file metadata
   - Find parent directories
   - Identify related files via cross-references
5. Return ranked results with context

Why This Matters

Pure vector search returns results without structure. Pure graph traversal finds structure without semantic understanding.

Together, they return results that are both semantically relevant AND contextually connected. You find the document about AI, and immediately see it’s related to your Q3 planning folder and the email thread with the team.

Trade-offs

This architecture isn’t simple:

  • Two databases to maintain
  • Synchronization between graph and vectors
  • More complex queries

For applications where both relationship and similarity matter, the complexity pays off.