Overview
RCLI’s RAG (Retrieval-Augmented Generation) system allows you to index local documents and query them using natural language. The system combines vector search with BM25 full-text search using Reciprocal Rank Fusion for optimal retrieval accuracy.Hybrid Search
Vector + BM25 + RRF fusion
4ms Retrieval
Near-instant search over 5K+ chunks
On-Device
100% local, no external API calls
Supported File Types
RCLI can ingest and index the following document formats:- PDF - Portable Document Format
- DOCX - Microsoft Word documents
- TXT - Plain text files
- MD - Markdown files
All document processing happens locally using native parsers. No cloud services or OCR required.
How It Works
Document Ingestion
Run RCLI recursively scans the directory, extracts text, and splits it into 512-token chunks with 50-token overlap.
rcli rag ingest <directory> to index documents:Index Building
Three indices are built:
- Vector Index: USearch HNSW for semantic search
- BM25 Index: Inverted index for keyword search
- Chunk Store: mmap’d binary file for fast text retrieval
Hybrid Retrieval
RCLI combines two complementary search methods:Vector Search (Semantic)
- Algorithm: HNSW (Hierarchical Navigable Small World)
- Library: USearch v2.16.5
- Metric: Cosine similarity
- Candidates: 10 (configurable)
BM25 (Keyword)
- Algorithm: Best Matching 25 (BM25)
- Parameters: k1=1.5, b=0.75 (standard values)
- Candidates: 10 (configurable)
Reciprocal Rank Fusion (RRF)
Combines vector and BM25 results using rank-based scoring:RRF gives higher weight to documents that appear in both result sets, improving precision.
Index Structure
The RAG index is stored in~/Library/RCLI/index/ with the following files:
Chunk Store (mmap)
Chunk text is stored in a memory-mapped binary file for zero-copy retrieval:Using
mmap() allows the OS to page in chunk text on-demand without loading the entire store into RAM.Document Processor
The document processor extracts text from various formats:- PDF
- DOCX
- TXT / MD
Uses
pdftotext (from poppler-utils) to extract plain text:Chunking Strategy
- Chunk Size: 512 tokens (~2000 characters)
- Overlap: 50 tokens (~200 characters)
- Preserves: Sentence boundaries (uses
SentenceDetector)
Performance
Retrieval
3.82 ms hybrid search
Embedding
12 ms per chunk (384-dim)
Indexing
~200 docs/sec ingestion
Embedding Cache
RCLI caches embeddings with LRU eviction:Usage Examples
Index Documents
Query Documents
Drag-and-Drop Indexing (TUI)
In the TUI, drag a file or folder from Finder into the terminal window. RCLI automatically:- Detects the drop event
- Indexes the file/folder
- Loads the index for immediate querying
Configuration
RAG parameters can be tuned via environment variables or config:Parameter Tuning Guide
Parameter Tuning Guide
vector_candidates: Increase for better recall, decrease for speedbm25_candidates: Increase for more keyword matchesrrf_k: Higher values (e.g., 100) give more weight to top-ranked resultschunk_size: Smaller chunks improve precision, larger improve contextchunk_overlap: Higher overlap prevents splitting sentences
Benchmarking
Test RAG performance with:Troubleshooting
Indexing fails with 'pdftotext not found'
Indexing fails with 'pdftotext not found'
Install poppler-utils:
Out of memory during indexing
Out of memory during indexing
Large document collections may exceed available RAM. Try:
- Index in smaller batches
- Increase chunk size to reduce total chunks
- Close other applications
Query returns irrelevant results
Query returns irrelevant results
Tune retrieval parameters:
- Increase
vector_candidatesandbm25_candidatesto 20 - Adjust
rrf_kto favor top-ranked results - Re-index with smaller chunk size for better precision
Index loading is slow
Index loading is slow
The vector index is mmap’d but initial load requires reading metadata. For large indices (>100K chunks), expect 500ms-2s load time.
Next Steps
RAG Commands
Complete command reference for ingestion and querying
RAG API
Embed RAG in your own applications
Configuration
Tune RAG parameters for your use case
Performance
Understand retrieval benchmarks