Skip to main content
RAG lets agents search through your documents to find relevant information before responding. Docker Agent supports:
  • Background indexing — files are indexed automatically and re-indexed on change
  • Multiple strategies — semantic embeddings, BM25 keyword search, and LLM-enhanced semantic search
  • Hybrid search — combine strategies with result fusion for best recall
  • Reranking — re-score results with a model for improved relevance

Quick start

rag:
  my_docs:
    tool:
      description: "Technical documentation"
    docs: [./documents, ./some-doc.md]
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        database: ./docs.db
        vector_dimensions: 1536

agents:
  root:
    model: openai/gpt-4o
    instruction: |
      You have access to a knowledge base. Use it to answer questions.
    rag: [my_docs]

Retrieval strategies

Uses an embedding model to find semantically similar content. Best for understanding intent, synonyms, and paraphrasing.
strategies:
  - type: chunked-embeddings
    embedding_model: openai/text-embedding-3-small
    database: ./vector.db
    vector_dimensions: 1536
    similarity_metric: cosine_similarity
    threshold: 0.5
    limit: 10
    batch_size: 50
    max_embedding_concurrency: 3
    chunking:
      size: 1000
      overlap: 100
      respect_word_boundaries: true

Semantic embeddings (LLM-enhanced)

Uses an LLM to generate semantic summaries of each chunk before embedding. Captures meaning and intent rather than literal text. Best for code search.
strategies:
  - type: semantic-embeddings
    embedding_model: openai/text-embedding-3-small
    vector_dimensions: 1536
    chat_model: openai/gpt-4o-mini
    database: ./semantic.db
    ast_context: true
    chunking:
      size: 1000
      code_aware: true
Semantic embeddings provide higher quality retrieval but slower indexing (one LLM call per chunk) and additional API costs.
Traditional keyword matching using the BM25 algorithm. Best for exact terms, technical jargon, and code identifiers.
strategies:
  - type: bm25
    database: ./bm25.db
    k1: 1.5
    b: 0.75
    threshold: 0.3
    limit: 10
    chunking:
      size: 1000
      overlap: 100
      respect_word_boundaries: true
Combine multiple strategies. Results run in parallel and are fused together:
rag:
  knowledge_base:
    tool:
      description: Search for information about blorks
    docs:
      - ./blork_field_guide.txt
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        docs:
          - ./docs
        database: ./chunked_embeddings.db
        similarity_metric: cosine_similarity
        threshold: 0.5
        limit: 20
        vector_dimensions: 1536
        chunking:
          size: 1000
          overlap: 100
      - type: bm25
        docs:
          - ./docs
        database: ./bm25.db
        k1: 1.5
        b: 0.75
        threshold: 0.3
        limit: 15
        chunking:
          size: 1000
          overlap: 100
    results:
      fusion:
        strategy: rrf
        k: 60
      deduplicate: true
      limit: 5

Fusion strategies

StrategyBest forDescription
rrfGeneral use (recommended)Reciprocal Rank Fusion — rank-based, no score normalization needed
weightedKnown performance characteristicsWeight strategies differently (e.g., embeddings 0.7, BM25 0.3)
maxSame scoring scaleTakes the maximum score from any strategy

Reranking

Re-score retrieved results with a model to improve relevance. Happens after retrieval and fusion, before the final limit.
results:
  reranking:
    model: openai/gpt-4.1-nano
    top_k: 10
    threshold: 0.3
    criteria: |
      Prioritize official documentation over blog posts.
      Prefer recent information and practical examples.
  deduplicate: true
  limit: 5
Supported reranking providers: DMR (native /rerank endpoint), OpenAI, Anthropic, Google Gemini.
To use DMR for reranking, pull a reranking model first:
docker model pull hf.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF
If reranking fails, the system falls back to original retrieval scores.

Code-aware chunking

For source code, enable AST-based chunking to keep functions and methods intact:
chunking:
  size: 2000
  code_aware: true
Currently supports Go (.go) files via tree-sitter. Other file types fall back to plain-text chunking.

Debugging RAG

docker agent run config.yaml --debug --log-file debug.log
Search the log for tags: [RAG Manager], [Chunked-Embeddings Strategy], [BM25 Strategy], [RRF Fusion], [Reranker].

Configuration reference

Top-level fields

FieldTypeDefaultDescription
docs[]stringDocument paths/directories shared across strategies
tool.descriptionstringDescription of the RAG tool shown to the agent
respect_vcsbooleantrueRespect .gitignore when indexing
strategies[]objectArray of retrieval strategy configurations
resultsobjectPost-processing: fusion, reranking, deduplication, final limit

Chunked-embeddings strategy

FieldTypeDefaultDescription
embedding_modelstringRequired. Embedding model reference
databasestringPath to local SQLite database
vector_dimensionsintEmbedding dimensions (e.g., 1536 for text-embedding-3-small)
similarity_metricstringcosine_similaritySimilarity metric
thresholdfloat0.5Minimum similarity score (0–1)
limitint5Max results from this strategy
batch_sizeint50Chunks per embedding request
max_embedding_concurrencyint3Max concurrent embedding requests
chunking.sizeint1000Chunk size in characters
chunking.overlapint75Overlap between chunks in characters
chunking.code_awareboolfalseAST-based chunking (Go files)
chunking.respect_word_boundariesboolfalseAvoid splitting mid-word

Semantic-embeddings strategy

FieldTypeDefaultDescription
embedding_modelstringRequired. Embedding model reference
chat_modelstringRequired. LLM for generating semantic summaries
vector_dimensionsintRequired. Embedding dimensions
databasestringPath to local SQLite database
semantic_promptstring(built-in)Custom prompt template (${path}, ${content}, ${ast_context})
ast_contextboolfalseInclude tree-sitter AST metadata in prompts
thresholdfloat0.5Minimum similarity score
limitint5Max results
max_indexing_concurrencyint3Max concurrent file indexing
chunking.sizeint1000Chunk size in characters
chunking.overlapint75Overlap between chunks
chunking.code_awareboolfalseAST-based chunking

BM25 strategy

FieldTypeDefaultDescription
databasestringPath to local SQLite database
k1float1.5Term frequency saturation (1.2–2.0 recommended)
bfloat0.75Length normalization (0–1)
thresholdfloat0.0Minimum BM25 score
limitint5Max results
chunking.sizeint1000Chunk size in characters
chunking.overlapint75Overlap between chunks

Results (post-processing)

FieldTypeDefaultDescription
fusion.strategystringrrfFusion method: rrf, weighted, or max
fusion.kint60RRF rank constant
deduplicatebooltrueRemove duplicate results
limitint15Final number of results to return
include_scoreboolfalseInclude relevance scores in results
return_full_contentboolfalseReturn full document instead of matched chunks
reranking.modelstringReranking model reference
reranking.top_kint(all)Only rerank the top K results
reranking.thresholdfloat0.5Minimum relevance score after reranking
reranking.criteriastringCustom relevance guidance for the reranking model

Build docs developers (and LLMs) love