Skip to main content
RCLI’s RAG (Retrieval-Augmented Generation) system indexes local documents and enables natural language Q&A powered by hybrid search (vector + BM25) and LLM generation.

Commands Overview

rcli rag ingest <dir>    # Index documents from a directory
rcli rag query <text>    # Query indexed documents
rcli rag status          # Show index info

Ingestion

Index a Directory

rcli rag ingest ~/Documents/notes

# Output:
  RAG Ingest
  Indexing documents from: /Users/you/Documents/notes

  Processing 47 files...
 Indexed 523 chunks
 Built vector index (HNSW)
 Built BM25 index

  Indexing complete!

  Query your docs:
    rcli rag query "your question here"
    rcli ask --rag ~/Library/RCLI/index "your question"

Supported File Types

  • PDF — Text extraction via pdftotext
  • DOCX — Text extraction via unzip + XML parsing
  • TXT — Plain text files
  • MD — Markdown files
Other formats are skipped with a warning.

Chunking Strategy

Documents are split into 512-token chunks with 50-token overlap. Each chunk includes:
  • Text — The chunk content
  • Embedding — 384-dim vector (Snowflake Arctic Embed S)
  • Metadata — File path, chunk index

Index Location

By default, indexes are saved to:
~/Library/RCLI/index/
  ├── chunks.json          # Chunk metadata + text
  ├── embeddings.bin       # Float32 vectors
  ├── usearch.index        # HNSW vector index
  └── bm25.json            # BM25 term frequencies

Re-Indexing

Running rcli rag ingest on the same directory replaces the existing index:
rcli rag ingest ~/Documents/notes  # First run
rcli rag ingest ~/Documents/notes  # Overwrites previous index
To index multiple directories, combine them:
mkdir -p ~/Documents/all-docs
cp -r ~/Documents/notes ~/Documents/all-docs/
cp -r ~/Documents/research ~/Documents/all-docs/
rcli rag ingest ~/Documents/all-docs

Querying

Basic Query

rcli rag query "What were the key decisions from the meeting?"

# Output:
The key decisions were:
1. Launch date moved to Q3
2. Budget increased by 20%
3. Hired 2 additional engineers

Query with Interactive Mode

rcli --rag ~/Library/RCLI/index

# Now all queries use RAG:
> what were the key decisions?
> summarize the project plan

Query with Listen Mode

rcli listen --rag ~/Library/RCLI/index

# Speak: "what were the key decisions?"
# RCLI retrieves context and responds

Query with ask

rcli ask --rag ~/Library/RCLI/index "summarize the project plan"

Hybrid Retrieval

RCLI uses Reciprocal Rank Fusion (RRF) to combine:
  1. Vector Search — USearch HNSW index (cosine similarity)
  2. BM25 Full-Text Search — Token-based ranking
This approach balances semantic similarity (vector) with exact keyword matching (BM25).

Retrieval Parameters

  • Top-k — 5 chunks retrieved per query
  • RRF k — 60 (reciprocal rank fusion constant)
  • Embedding cache — LRU cache (256 entries, 99.9% hit rate)

Performance

On Apple M3 Max:
  • Embedding — ~8ms (cached: 0.01ms)
  • Vector search — ~2ms (5K chunks)
  • BM25 search — ~1ms
  • RRF fusion — ~0.5ms
  • Total retrieval~4ms

Status Command

rcli rag status

# Output:
  RAG Index: /Users/you/Library/RCLI/index
  Status: indexed
If no index exists:
  No RAG index found.
  Run: rcli rag ingest <directory>

Options

--models
string
default:"~/Library/RCLI/models"
Models directory (must contain arctic-embed-s.gguf)
--rag
string
default:"~/Library/RCLI/index"
Custom index path for querying

Embedding Model

RCLI uses Snowflake Arctic Embed S (Q8_0 quantized):
  • Size — 34 MB
  • Dimensions — 384
  • Speed — ~8ms per query embedding
  • License — Apache 2.0

Download Embedding Model

rcli setup  # Includes Arctic Embed S

# Or download manually:
cd ~/Library/RCLI/models
curl -LO https://huggingface.co/snowflake/snowflake-arctic-embed-s-v2.0/resolve/main/arctic-embed-s.gguf

Interactive RAG Panel

In interactive mode (rcli), press R to open the RAG panel:
  • Ingest documents — Enter path, index files
  • Show status — Display indexed file count
  • Clear index — Remove all indexed documents

Example Workflows

Research Assistant

# Index research papers
rcli rag ingest ~/Documents/papers

# Query via voice
rcli listen --rag ~/Library/RCLI/index

# Ask: "what did the paper say about transformers?"

Meeting Notes Q&A

# Index meeting notes
rcli rag ingest ~/Documents/meetings

# Query in interactive mode
rcli --rag ~/Library/RCLI/index

> what were the action items from yesterday's meeting?
> who was assigned to the backend task?
# Index project docs
rcli rag ingest ~/projects/myapp/docs

# Query from command line
rcli ask --rag ~/Library/RCLI/index "how do I configure authentication?"

Drag-and-Drop Indexing

In the TUI (rcli), drag a file or folder from Finder into the terminal:
# Finder drag → Terminal receives path
/Users/you/Documents/project.pdf

# Type: rag ingest /Users/you/Documents/project.pdf
# Or press R (RAG panel), select "Ingest documents", paste path

Benchmarking RAG

Test retrieval performance:
rcli bench --suite rag --rag ~/Library/RCLI/index

# Output:
--- RAG Benchmark ---
  Embedding: 7.8ms
  Vector search: 2.1ms
  BM25 search: 0.9ms
  RRF fusion: 0.4ms
  Total retrieval: 3.8ms

Advanced Configuration

Custom Index Path

# Ingest to custom location
rcli rag ingest ~/Documents/notes
# Index saved to ~/Library/RCLI/index (default)

# Query from custom location
rcli rag query "question" --rag /path/to/custom/index

Chunk Size Tuning

Edit src/rag/doc_processor.h and recompile:
static constexpr int CHUNK_SIZE = 512;   // Default: 512 tokens
static constexpr int CHUNK_OVERLAP = 50; // Default: 50 tokens

Top-k Retrieval

Edit src/rag/hybrid_retriever.h and recompile:
static constexpr int TOP_K = 5;  // Default: 5 chunks

Troubleshooting

Embedding Model Missing

Error: Embedding model not found
Run: rcli setup
Solution: rcli setup downloads arctic-embed-s.gguf

No Documents Indexed

 No supported files found in /path/to/dir
Solution: Ensure directory contains .pdf, .docx, .txt, or .md files

Low Retrieval Accuracy

  • Increase top-k — Retrieve more chunks (edit source)
  • Use better embeddings — Snowflake Arctic Embed S is optimized for speed; larger models may improve accuracy
  • Refine queries — Be specific (e.g., “deployment steps” vs “how to deploy?”)

Implementation Details

Vector Index (USearch)

  • Algorithm — HNSW (Hierarchical Navigable Small World)
  • Distance — Cosine similarity
  • Connectivity — M=16, ef_construction=200

BM25 Parameters

  • k1 — 1.5 (term frequency saturation)
  • b — 0.75 (length normalization)

RRF Formula

score = Σ (1 / (k + rank_i))
k = 60
Where rank_i is the rank from vector or BM25 search.

API Access

For programmatic access, use the C API:
#include "api/rcli_api.h"

RCLIHandle engine = rcli_create(NULL);
rcli_init(engine, "/path/to/models", 99);

// Ingest
rcli_rag_ingest(engine, "/path/to/docs");

// Query
const char* response = rcli_rag_query(engine, "your question");
printf("%s\n", response);

// Cleanup
rcli_destroy(engine);

Build docs developers (and LLMs) love