Knowledge Arbitrage
Knowledge Arbitrage
A graph-based filesystem that combines Neo4j’s relationship modeling with LanceDB’s vector search capabilities. Files and directories exist as interconnected nodes, while content is chunked and embedded for semantic querying.
Features
- Graph-Based Storage — Files and directories stored as Neo4j nodes with
CONTAINSrelationships - Hierarchical Structure — Traditional filesystem operations (create, read, write, move, delete)
- Semantic Search — LanceDB stores embeddings for content-based querying
- Hybrid Queries — Combine graph traversal with vector similarity
- Text Optimization — Cleans HTML, normalizes formatting, semantic chunking
Technical Highlights
- Neo4j for graph structure and relationship queries
- LanceDB for vector storage and similarity search
- Cohere embed-english-v3.0 for embeddings
- chonkie for semantic text chunking
- FastAPI web framework
Tech Stack
| Component | Technology |
|---|---|
| Graph Database | Neo4j 5.0+ |
| Vector Database | LanceDB |
| Embeddings | Cohere |
| Text Chunking | chonkie |
| Web Framework | FastAPI |
Architecture
Filesystem (Neo4j) Embeddings (LanceDB)
├── /projects → chunk_1 → [embedding]
│ └── /notes → chunk_2 → [embedding]
└── /docs → chunk_3 → [embedding]
Query: "show me files about AI projects"
→ Vector search in LanceDB
→ Returns matching chunks with file paths from Neo4j