RAG Commands

RCLI’s RAG (Retrieval-Augmented Generation) system indexes local documents and enables natural language Q&A powered by hybrid search (vector + BM25) and LLM generation.

Commands Overview

rcli rag ingest <dir>    # Index documents from a directory
rcli rag query <text>    # Query indexed documents
rcli rag status          # Show index info

Ingestion

Index a Directory

rcli rag ingest ~/Documents/notes

# Output:
  RAG Ingest
  Indexing documents from: /Users/you/Documents/notes

  Processing 47 files...
  ✓ Indexed 523 chunks
  ✓ Built vector index (HNSW)
  ✓ Built BM25 index

  Indexing complete!

  Query your docs:
    rcli rag query "your question here"
    rcli ask --rag ~/Library/RCLI/index "your question"

Supported File Types

PDF — Text extraction via pdftotext
DOCX — Text extraction via unzip + XML parsing
TXT — Plain text files
MD — Markdown files

Other formats are skipped with a warning.

Chunking Strategy

Documents are split into 512-token chunks with 50-token overlap. Each chunk includes:

Text — The chunk content
Embedding — 384-dim vector (Snowflake Arctic Embed S)
Metadata — File path, chunk index

Index Location

By default, indexes are saved to:

~/Library/RCLI/index/
  ├── chunks.json          # Chunk metadata + text
  ├── embeddings.bin       # Float32 vectors
  ├── usearch.index        # HNSW vector index
  └── bm25.json            # BM25 term frequencies

Re-Indexing

Running rcli rag ingest on the same directory replaces the existing index:

rcli rag ingest ~/Documents/notes  # First run
rcli rag ingest ~/Documents/notes  # Overwrites previous index

To index multiple directories, combine them:

mkdir -p ~/Documents/all-docs
cp -r ~/Documents/notes ~/Documents/all-docs/
cp -r ~/Documents/research ~/Documents/all-docs/
rcli rag ingest ~/Documents/all-docs

Querying

Basic Query

rcli rag query "What were the key decisions from the meeting?"

# Output:
The key decisions were:
1. Launch date moved to Q3
2. Budget increased by 20%
3. Hired 2 additional engineers

Query with Interactive Mode

rcli --rag ~/Library/RCLI/index

# Now all queries use RAG:
> what were the key decisions?
> summarize the project plan

Query with Listen Mode

rcli listen --rag ~/Library/RCLI/index

# Speak: "what were the key decisions?"
# RCLI retrieves context and responds

Query with `ask`

rcli ask --rag ~/Library/RCLI/index "summarize the project plan"

Hybrid Retrieval

RCLI uses Reciprocal Rank Fusion (RRF) to combine:

Vector Search — USearch HNSW index (cosine similarity)
BM25 Full-Text Search — Token-based ranking

This approach balances semantic similarity (vector) with exact keyword matching (BM25).

Retrieval Parameters

Top-k — 5 chunks retrieved per query
RRF k — 60 (reciprocal rank fusion constant)
Embedding cache — LRU cache (256 entries, 99.9% hit rate)

Performance

On Apple M3 Max:

Embedding — ~8ms (cached: 0.01ms)
Vector search — ~2ms (5K chunks)
BM25 search — ~1ms
RRF fusion — ~0.5ms
Total retrieval — ~4ms

Status Command

rcli rag status

# Output:
  RAG Index: /Users/you/Library/RCLI/index
  Status: indexed

If no index exists:

  No RAG index found.
  Run: rcli rag ingest <directory>

Options

--models

string

default:"~/Library/RCLI/models"

Models directory (must contain arctic-embed-s.gguf)

--rag

string

default:"~/Library/RCLI/index"

Custom index path for querying

Embedding Model

RCLI uses Snowflake Arctic Embed S (Q8_0 quantized):

Size — 34 MB
Dimensions — 384
Speed — ~8ms per query embedding
License — Apache 2.0

Download Embedding Model

rcli setup  # Includes Arctic Embed S

# Or download manually:
cd ~/Library/RCLI/models
curl -LO https://huggingface.co/snowflake/snowflake-arctic-embed-s-v2.0/resolve/main/arctic-embed-s.gguf

Interactive RAG Panel

In interactive mode (rcli), press R to open the RAG panel:

Ingest documents — Enter path, index files
Show status — Display indexed file count
Clear index — Remove all indexed documents

Example Workflows

Research Assistant

# Index research papers
rcli rag ingest ~/Documents/papers

# Query via voice
rcli listen --rag ~/Library/RCLI/index

# Ask: "what did the paper say about transformers?"

Meeting Notes Q&A

# Index meeting notes
rcli rag ingest ~/Documents/meetings

# Query in interactive mode
rcli --rag ~/Library/RCLI/index

> what were the action items from yesterday's meeting?
> who was assigned to the backend task?

Documentation Search

# Index project docs
rcli rag ingest ~/projects/myapp/docs

# Query from command line
rcli ask --rag ~/Library/RCLI/index "how do I configure authentication?"

Drag-and-Drop Indexing

In the TUI (rcli), drag a file or folder from Finder into the terminal:

# Finder drag → Terminal receives path
/Users/you/Documents/project.pdf

# Type: rag ingest /Users/you/Documents/project.pdf
# Or press R (RAG panel), select "Ingest documents", paste path

Benchmarking RAG

Test retrieval performance:

rcli bench --suite rag --rag ~/Library/RCLI/index

# Output:
--- RAG Benchmark ---
  Embedding: 7.8ms
  Vector search: 2.1ms
  BM25 search: 0.9ms
  RRF fusion: 0.4ms
  Total retrieval: 3.8ms

Advanced Configuration

Custom Index Path

# Ingest to custom location
rcli rag ingest ~/Documents/notes
# Index saved to ~/Library/RCLI/index (default)

# Query from custom location
rcli rag query "question" --rag /path/to/custom/index

Chunk Size Tuning

Edit src/rag/doc_processor.h and recompile:

static constexpr int CHUNK_SIZE = 512;   // Default: 512 tokens
static constexpr int CHUNK_OVERLAP = 50; // Default: 50 tokens

Top-k Retrieval

Edit src/rag/hybrid_retriever.h and recompile:

static constexpr int TOP_K = 5;  // Default: 5 chunks

Troubleshooting

Embedding Model Missing

Error: Embedding model not found
Run: rcli setup

Solution: rcli setup downloads arctic-embed-s.gguf

No Documents Indexed

✗ No supported files found in /path/to/dir

Solution: Ensure directory contains .pdf, .docx, .txt, or .md files

Low Retrieval Accuracy

Increase top-k — Retrieve more chunks (edit source)
Use better embeddings — Snowflake Arctic Embed S is optimized for speed; larger models may improve accuracy
Refine queries — Be specific (e.g., “deployment steps” vs “how to deploy?”)

Implementation Details

Vector Index (USearch)

Algorithm — HNSW (Hierarchical Navigable Small World)
Distance — Cosine similarity
Connectivity — M=16, ef_construction=200

BM25 Parameters

k1 — 1.5 (term frequency saturation)
b — 0.75 (length normalization)

RRF Formula

score = Σ (1 / (k + rank_i))
k = 60

Where rank_i is the rank from vector or BM25 search.

API Access

For programmatic access, use the C API:

#include "api/rcli_api.h"

RCLIHandle engine = rcli_create(NULL);
rcli_init(engine, "/path/to/models", 99);

// Ingest
rcli_rag_ingest(engine, "/path/to/docs");

// Query
const char* response = rcli_rag_query(engine, "your question");
printf("%s\n", response);

// Cleanup
rcli_destroy(engine);

Get Started

Core Features

Commands

Models

Actions

Advanced

Development

​Commands Overview

​Ingestion

​Index a Directory

​Supported File Types

​Chunking Strategy

​Index Location

​Re-Indexing

​Querying

​Basic Query

​Query with Interactive Mode

​Query with Listen Mode

​Query with ask

​Hybrid Retrieval

​Retrieval Parameters

​Performance

​Status Command

​Options

​Embedding Model

​Download Embedding Model

​Interactive RAG Panel

​Example Workflows

​Research Assistant

​Meeting Notes Q&A

​Documentation Search

​Drag-and-Drop Indexing

​Benchmarking RAG

​Advanced Configuration

​Custom Index Path

​Chunk Size Tuning

​Top-k Retrieval

​Troubleshooting

​Embedding Model Missing

​No Documents Indexed

​Low Retrieval Accuracy

​Implementation Details

​Vector Index (USearch)

​BM25 Parameters

​RRF Formula

​API Access

Build docs developers (and LLMs) love