RCLI’s RAG (Retrieval-Augmented Generation) system indexes local documents and enables natural language Q&A powered by hybrid search (vector + BM25) and LLM generation.
Commands Overview
rcli rag ingest <dir> # Index documents from a directory
rcli rag query <text> # Query indexed documents
rcli rag status # Show index info
Ingestion
Index a Directory
rcli rag ingest ~/Documents/notes
# Output:
RAG Ingest
Indexing documents from: /Users/you/Documents/notes
Processing 47 files...
✓ Indexed 523 chunks
✓ Built vector index (HNSW)
✓ Built BM25 index
Indexing complete!
Query your docs:
rcli rag query "your question here"
rcli ask --rag ~/Library/RCLI/index "your question"
Supported File Types
- PDF — Text extraction via pdftotext
- DOCX — Text extraction via unzip + XML parsing
- TXT — Plain text files
- MD — Markdown files
Other formats are skipped with a warning.
Chunking Strategy
Documents are split into 512-token chunks with 50-token overlap. Each chunk includes:
- Text — The chunk content
- Embedding — 384-dim vector (Snowflake Arctic Embed S)
- Metadata — File path, chunk index
Index Location
By default, indexes are saved to:
~/Library/RCLI/index/
├── chunks.json # Chunk metadata + text
├── embeddings.bin # Float32 vectors
├── usearch.index # HNSW vector index
└── bm25.json # BM25 term frequencies
Re-Indexing
Running rcli rag ingest on the same directory replaces the existing index:
rcli rag ingest ~/Documents/notes # First run
rcli rag ingest ~/Documents/notes # Overwrites previous index
To index multiple directories, combine them:
mkdir -p ~/Documents/all-docs
cp -r ~/Documents/notes ~/Documents/all-docs/
cp -r ~/Documents/research ~/Documents/all-docs/
rcli rag ingest ~/Documents/all-docs
Querying
Basic Query
rcli rag query "What were the key decisions from the meeting?"
# Output:
The key decisions were:
1. Launch date moved to Q3
2. Budget increased by 20%
3. Hired 2 additional engineers
Query with Interactive Mode
rcli --rag ~/Library/RCLI/index
# Now all queries use RAG:
> what were the key decisions?
> summarize the project plan
Query with Listen Mode
rcli listen --rag ~/Library/RCLI/index
# Speak: "what were the key decisions?"
# RCLI retrieves context and responds
Query with ask
rcli ask --rag ~/Library/RCLI/index "summarize the project plan"
Hybrid Retrieval
RCLI uses Reciprocal Rank Fusion (RRF) to combine:
- Vector Search — USearch HNSW index (cosine similarity)
- BM25 Full-Text Search — Token-based ranking
This approach balances semantic similarity (vector) with exact keyword matching (BM25).
Retrieval Parameters
- Top-k — 5 chunks retrieved per query
- RRF k — 60 (reciprocal rank fusion constant)
- Embedding cache — LRU cache (256 entries, 99.9% hit rate)
On Apple M3 Max:
- Embedding — ~8ms (cached: 0.01ms)
- Vector search — ~2ms (5K chunks)
- BM25 search — ~1ms
- RRF fusion — ~0.5ms
- Total retrieval — ~4ms
Status Command
rcli rag status
# Output:
RAG Index: /Users/you/Library/RCLI/index
Status: indexed
If no index exists:
No RAG index found.
Run: rcli rag ingest <directory>
Options
--models
string
default:"~/Library/RCLI/models"
Models directory (must contain arctic-embed-s.gguf)
--rag
string
default:"~/Library/RCLI/index"
Custom index path for querying
Embedding Model
RCLI uses Snowflake Arctic Embed S (Q8_0 quantized):
- Size — 34 MB
- Dimensions — 384
- Speed — ~8ms per query embedding
- License — Apache 2.0
Download Embedding Model
rcli setup # Includes Arctic Embed S
# Or download manually:
cd ~/Library/RCLI/models
curl -LO https://huggingface.co/snowflake/snowflake-arctic-embed-s-v2.0/resolve/main/arctic-embed-s.gguf
Interactive RAG Panel
In interactive mode (rcli), press R to open the RAG panel:
- Ingest documents — Enter path, index files
- Show status — Display indexed file count
- Clear index — Remove all indexed documents
Example Workflows
Research Assistant
# Index research papers
rcli rag ingest ~/Documents/papers
# Query via voice
rcli listen --rag ~/Library/RCLI/index
# Ask: "what did the paper say about transformers?"
Meeting Notes Q&A
# Index meeting notes
rcli rag ingest ~/Documents/meetings
# Query in interactive mode
rcli --rag ~/Library/RCLI/index
> what were the action items from yesterday's meeting?
> who was assigned to the backend task?
Documentation Search
# Index project docs
rcli rag ingest ~/projects/myapp/docs
# Query from command line
rcli ask --rag ~/Library/RCLI/index "how do I configure authentication?"
Drag-and-Drop Indexing
In the TUI (rcli), drag a file or folder from Finder into the terminal:
# Finder drag → Terminal receives path
/Users/you/Documents/project.pdf
# Type: rag ingest /Users/you/Documents/project.pdf
# Or press R (RAG panel), select "Ingest documents", paste path
Benchmarking RAG
Test retrieval performance:
rcli bench --suite rag --rag ~/Library/RCLI/index
# Output:
--- RAG Benchmark ---
Embedding: 7.8ms
Vector search: 2.1ms
BM25 search: 0.9ms
RRF fusion: 0.4ms
Total retrieval: 3.8ms
Advanced Configuration
Custom Index Path
# Ingest to custom location
rcli rag ingest ~/Documents/notes
# Index saved to ~/Library/RCLI/index (default)
# Query from custom location
rcli rag query "question" --rag /path/to/custom/index
Chunk Size Tuning
Edit src/rag/doc_processor.h and recompile:
static constexpr int CHUNK_SIZE = 512; // Default: 512 tokens
static constexpr int CHUNK_OVERLAP = 50; // Default: 50 tokens
Top-k Retrieval
Edit src/rag/hybrid_retriever.h and recompile:
static constexpr int TOP_K = 5; // Default: 5 chunks
Troubleshooting
Embedding Model Missing
Error: Embedding model not found
Run: rcli setup
Solution: rcli setup downloads arctic-embed-s.gguf
No Documents Indexed
✗ No supported files found in /path/to/dir
Solution: Ensure directory contains .pdf, .docx, .txt, or .md files
Low Retrieval Accuracy
- Increase top-k — Retrieve more chunks (edit source)
- Use better embeddings — Snowflake Arctic Embed S is optimized for speed; larger models may improve accuracy
- Refine queries — Be specific (e.g., “deployment steps” vs “how to deploy?”)
Implementation Details
Vector Index (USearch)
- Algorithm — HNSW (Hierarchical Navigable Small World)
- Distance — Cosine similarity
- Connectivity — M=16, ef_construction=200
BM25 Parameters
- k1 — 1.5 (term frequency saturation)
- b — 0.75 (length normalization)
score = Σ (1 / (k + rank_i))
k = 60
Where rank_i is the rank from vector or BM25 search.
API Access
For programmatic access, use the C API:
#include "api/rcli_api.h"
RCLIHandle engine = rcli_create(NULL);
rcli_init(engine, "/path/to/models", 99);
// Ingest
rcli_rag_ingest(engine, "/path/to/docs");
// Query
const char* response = rcli_rag_query(engine, "your question");
printf("%s\n", response);
// Cleanup
rcli_destroy(engine);