RAG (Retrieval-Augmented Generation)

RAG lets agents search through your documents to find relevant information before responding. Docker Agent supports:

Background indexing — files are indexed automatically and re-indexed on change
Multiple strategies — semantic embeddings, BM25 keyword search, and LLM-enhanced semantic search
Hybrid search — combine strategies with result fusion for best recall
Reranking — re-score results with a model for improved relevance

Quick start

rag:
  my_docs:
    tool:
      description: "Technical documentation"
    docs: [./documents, ./some-doc.md]
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        database: ./docs.db
        vector_dimensions: 1536

agents:
  root:
    model: openai/gpt-4o
    instruction: |
      You have access to a knowledge base. Use it to answer questions.
    rag: [my_docs]

Retrieval strategies

Chunked embeddings (semantic search)

Uses an embedding model to find semantically similar content. Best for understanding intent, synonyms, and paraphrasing.

strategies:
  - type: chunked-embeddings
    embedding_model: openai/text-embedding-3-small
    database: ./vector.db
    vector_dimensions: 1536
    similarity_metric: cosine_similarity
    threshold: 0.5
    limit: 10
    batch_size: 50
    max_embedding_concurrency: 3
    chunking:
      size: 1000
      overlap: 100
      respect_word_boundaries: true

Semantic embeddings (LLM-enhanced)

Uses an LLM to generate semantic summaries of each chunk before embedding. Captures meaning and intent rather than literal text. Best for code search.

strategies:
  - type: semantic-embeddings
    embedding_model: openai/text-embedding-3-small
    vector_dimensions: 1536
    chat_model: openai/gpt-4o-mini
    database: ./semantic.db
    ast_context: true
    chunking:
      size: 1000
      code_aware: true

Semantic embeddings provide higher quality retrieval but slower indexing (one LLM call per chunk) and additional API costs.

BM25 (keyword search)

Traditional keyword matching using the BM25 algorithm. Best for exact terms, technical jargon, and code identifiers.

strategies:
  - type: bm25
    database: ./bm25.db
    k1: 1.5
    b: 0.75
    threshold: 0.3
    limit: 10
    chunking:
      size: 1000
      overlap: 100
      respect_word_boundaries: true

Hybrid search

Combine multiple strategies. Results run in parallel and are fused together:

rag:
  knowledge_base:
    tool:
      description: Search for information about blorks
    docs:
      - ./blork_field_guide.txt
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        docs:
          - ./docs
        database: ./chunked_embeddings.db
        similarity_metric: cosine_similarity
        threshold: 0.5
        limit: 20
        vector_dimensions: 1536
        chunking:
          size: 1000
          overlap: 100
      - type: bm25
        docs:
          - ./docs
        database: ./bm25.db
        k1: 1.5
        b: 0.75
        threshold: 0.3
        limit: 15
        chunking:
          size: 1000
          overlap: 100
    results:
      fusion:
        strategy: rrf
        k: 60
      deduplicate: true
      limit: 5

Fusion strategies

Strategy	Best for	Description
`rrf`	General use (recommended)	Reciprocal Rank Fusion — rank-based, no score normalization needed
`weighted`	Known performance characteristics	Weight strategies differently (e.g., embeddings 0.7, BM25 0.3)
`max`	Same scoring scale	Takes the maximum score from any strategy

Reranking

Re-score retrieved results with a model to improve relevance. Happens after retrieval and fusion, before the final limit.

results:
  reranking:
    model: openai/gpt-4.1-nano
    top_k: 10
    threshold: 0.3
    criteria: |
      Prioritize official documentation over blog posts.
      Prefer recent information and practical examples.
  deduplicate: true
  limit: 5

Supported reranking providers: DMR (native /rerank endpoint), OpenAI, Anthropic, Google Gemini.

To use DMR for reranking, pull a reranking model first:

docker model pull hf.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF

If reranking fails, the system falls back to original retrieval scores.

Code-aware chunking

For source code, enable AST-based chunking to keep functions and methods intact:

chunking:
  size: 2000
  code_aware: true

Currently supports Go (.go) files via tree-sitter. Other file types fall back to plain-text chunking.

Debugging RAG

docker agent run config.yaml --debug --log-file debug.log

Search the log for tags: [RAG Manager], [Chunked-Embeddings Strategy], [BM25 Strategy], [RRF Fusion], [Reranker].

Configuration reference

Top-level fields

Field	Type	Default	Description
`docs`	`[]string`	—	Document paths/directories shared across strategies
`tool.description`	`string`	—	Description of the RAG tool shown to the agent
`respect_vcs`	`boolean`	`true`	Respect `.gitignore` when indexing
`strategies`	`[]object`	—	Array of retrieval strategy configurations
`results`	`object`	—	Post-processing: fusion, reranking, deduplication, final limit

Chunked-embeddings strategy

Field	Type	Default	Description
`embedding_model`	`string`	—	Required. Embedding model reference
`database`	`string`	—	Path to local SQLite database
`vector_dimensions`	`int`	—	Embedding dimensions (e.g., `1536` for `text-embedding-3-small`)
`similarity_metric`	`string`	`cosine_similarity`	Similarity metric
`threshold`	`float`	`0.5`	Minimum similarity score (0–1)
`limit`	`int`	`5`	Max results from this strategy
`batch_size`	`int`	`50`	Chunks per embedding request
`max_embedding_concurrency`	`int`	`3`	Max concurrent embedding requests
`chunking.size`	`int`	`1000`	Chunk size in characters
`chunking.overlap`	`int`	`75`	Overlap between chunks in characters
`chunking.code_aware`	`bool`	`false`	AST-based chunking (Go files)
`chunking.respect_word_boundaries`	`bool`	`false`	Avoid splitting mid-word

Semantic-embeddings strategy

Field	Type	Default	Description
`embedding_model`	`string`	—	Required. Embedding model reference
`chat_model`	`string`	—	Required. LLM for generating semantic summaries
`vector_dimensions`	`int`	—	Required. Embedding dimensions
`database`	`string`	—	Path to local SQLite database
`semantic_prompt`	`string`	(built-in)	Custom prompt template (`${path}`, `${content}`, `${ast_context}`)
`ast_context`	`bool`	`false`	Include tree-sitter AST metadata in prompts
`threshold`	`float`	`0.5`	Minimum similarity score
`limit`	`int`	`5`	Max results
`max_indexing_concurrency`	`int`	`3`	Max concurrent file indexing
`chunking.size`	`int`	`1000`	Chunk size in characters
`chunking.overlap`	`int`	`75`	Overlap between chunks
`chunking.code_aware`	`bool`	`false`	AST-based chunking

BM25 strategy

Field	Type	Default	Description
`database`	`string`	—	Path to local SQLite database
`k1`	`float`	`1.5`	Term frequency saturation (1.2–2.0 recommended)
`b`	`float`	`0.75`	Length normalization (0–1)
`threshold`	`float`	`0.0`	Minimum BM25 score
`limit`	`int`	`5`	Max results
`chunking.size`	`int`	`1000`	Chunk size in characters
`chunking.overlap`	`int`	`75`	Overlap between chunks

Results (post-processing)

Field	Type	Default	Description
`fusion.strategy`	`string`	`rrf`	Fusion method: `rrf`, `weighted`, or `max`
`fusion.k`	`int`	`60`	RRF rank constant
`deduplicate`	`bool`	`true`	Remove duplicate results
`limit`	`int`	`15`	Final number of results to return
`include_score`	`bool`	`false`	Include relevance scores in results
`return_full_content`	`bool`	`false`	Return full document instead of matched chunks
`reranking.model`	`string`	—	Reranking model reference
`reranking.top_k`	`int`	(all)	Only rerank the top K results
`reranking.threshold`	`float`	`0.5`	Minimum relevance score after reranking
`reranking.criteria`	`string`	—	Custom relevance guidance for the reranking model

Get Started

Core Concepts

Features

Configuration

Built-in Tools

Model Providers

Guides

Community

RAG (Retrieval-Augmented Generation)

Quick start

Retrieval strategies

Chunked embeddings (semantic search)

Semantic embeddings (LLM-enhanced)

BM25 (keyword search)

Hybrid search

Fusion strategies

Reranking

Code-aware chunking

Debugging RAG

Configuration reference

Top-level fields

Chunked-embeddings strategy

Semantic-embeddings strategy

BM25 strategy

Results (post-processing)

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Configuration

Built-in Tools

Model Providers

Guides

Community

​Quick start

​Retrieval strategies

​Chunked embeddings (semantic search)

​Semantic embeddings (LLM-enhanced)

​BM25 (keyword search)

​Hybrid search

​Fusion strategies

​Reranking

​Code-aware chunking

​Debugging RAG

​Configuration reference

​Top-level fields

​Chunked-embeddings strategy

​Semantic-embeddings strategy

​BM25 strategy

​Results (post-processing)

Build docs developers (and LLMs) love

Quick start

Retrieval strategies

Chunked embeddings (semantic search)

Semantic embeddings (LLM-enhanced)

BM25 (keyword search)

Hybrid search

Fusion strategies

Reranking

Code-aware chunking

Debugging RAG

Configuration reference

Top-level fields

Chunked-embeddings strategy

Semantic-embeddings strategy

BM25 strategy

Results (post-processing)