Knowledge Arbitrage

A graph-based filesystem that combines Neo4j’s relationship modeling with LanceDB’s vector search capabilities. Files and directories exist as interconnected nodes, while content is chunked and embedded for semantic querying.

Features

  • Graph-Based Storage — Files and directories stored as Neo4j nodes with CONTAINS relationships
  • Hierarchical Structure — Traditional filesystem operations (create, read, write, move, delete)
  • Semantic Search — LanceDB stores embeddings for content-based querying
  • Hybrid Queries — Combine graph traversal with vector similarity
  • Text Optimization — Cleans HTML, normalizes formatting, semantic chunking

Technical Highlights

  • Neo4j for graph structure and relationship queries
  • LanceDB for vector storage and similarity search
  • Cohere embed-english-v3.0 for embeddings
  • chonkie for semantic text chunking
  • FastAPI web framework

Tech Stack

ComponentTechnology
Graph DatabaseNeo4j 5.0+
Vector DatabaseLanceDB
EmbeddingsCohere
Text Chunkingchonkie
Web FrameworkFastAPI

Architecture

Filesystem (Neo4j)          Embeddings (LanceDB)
├── /projects              → chunk_1 → [embedding]
│   └── /notes             → chunk_2 → [embedding]
└── /docs                  → chunk_3 → [embedding]

Query: "show me files about AI projects"
→ Vector search in LanceDB
→ Returns matching chunks with file paths from Neo4j