Tag

Neo4j

2 articles tagged with "Neo4j"

Graph Meets Vector

Graph Meets Vector

Knowledge Arbitrage’s architecture for combining graph relationships with semantic search.

The Filesystem Problem

Traditional filesystems give you hierarchy. But knowledge doesn’t work that way — a document might belong to multiple projects, reference concepts across folders, and have relationships that evolve over time.

Graph databases model these relationships naturally. But pure graph traversal can’t answer “show me documents similar to this one.”

The Hybrid Approach

Knowledge Arbitrage combines two databases:

Knowledge Arbitrage

Knowledge Arbitrage

A graph-based filesystem that combines Neo4j’s relationship modeling with LanceDB’s vector search capabilities. Files and directories exist as interconnected nodes, while content is chunked and embedded for semantic querying.

Features

  • Graph-Based Storage — Files and directories stored as Neo4j nodes with CONTAINS relationships
  • Hierarchical Structure — Traditional filesystem operations (create, read, write, move, delete)
  • Semantic Search — LanceDB stores embeddings for content-based querying
  • Hybrid Queries — Combine graph traversal with vector similarity
  • Text Optimization — Cleans HTML, normalizes formatting, semantic chunking

Technical Highlights

  • Neo4j for graph structure and relationship queries
  • LanceDB for vector storage and similarity search
  • Cohere embed-english-v3.0 for embeddings
  • chonkie for semantic text chunking
  • FastAPI web framework

Tech Stack

ComponentTechnology
Graph DatabaseNeo4j 5.0+
Vector DatabaseLanceDB
EmbeddingsCohere
Text Chunkingchonkie
Web FrameworkFastAPI

Architecture

Filesystem (Neo4j)          Embeddings (LanceDB)
├── /projects              → chunk_1 → [embedding]
│   └── /notes             → chunk_2 → [embedding]
└── /docs                  → chunk_3 → [embedding]

Query: "show me files about AI projects"
→ Vector search in LanceDB
→ Returns matching chunks with file paths from Neo4j