Skip to main content

Overview

Open WebUI’s RAG (Retrieval-Augmented Generation) system enables powerful document-based chat interactions by combining vector search, hybrid retrieval, and multiple content extraction engines.

Document Upload & Processing

Supported File Types

Open WebUI supports extensive file format compatibility:
  • PDF (with OCR support)
  • Word (DOC, DOCX)
  • PowerPoint (PPT, PPTX)
  • Excel (XLS, XLSX)
  • Plain text (TXT, MD)
  • Rich text (RTF)

Content Extraction Engines

Choose from multiple extraction engines based on your needs:

Tika

Apache Tika - Universal document parser
  • Supports 1000+ file formats
  • Metadata extraction
  • Self-hosted option

Docling

IBM Docling - AI-powered extraction
  • Advanced layout understanding
  • Table structure preservation
  • High accuracy for complex documents

Document Intelligence

Azure Document Intelligence
  • Cloud-based OCR
  • Form recognition
  • Layout analysis
  • Custom model support

Mistral OCR

Mistral OCR API
  • AI-powered image text extraction
  • Multi-language support
  • High-quality results

Configuration

Configure content extraction settings:
# From routers/retrieval.py:481-511
{
  "CONTENT_EXTRACTION_ENGINE": "tika",  // tika, docling, azure, mistral
  "PDF_EXTRACT_IMAGES": true,
  "PDF_LOADER_MODE": "auto",  // auto, fast, quality
  
  // Tika settings
  "TIKA_SERVER_URL": "http://tika:9998",
  
  // Docling settings
  "DOCLING_SERVER_URL": "http://docling:5000",
  "DOCLING_API_KEY": "your-api-key",
  "DOCLING_PARAMS": {...},
  
  // Azure Document Intelligence
  "DOCUMENT_INTELLIGENCE_ENDPOINT": "https://your-instance.cognitiveservices.azure.com",
  "DOCUMENT_INTELLIGENCE_KEY": "your-key",
  "DOCUMENT_INTELLIGENCE_MODEL": "prebuilt-read",
  
  // Mistral OCR
  "MISTRAL_OCR_API_BASE_URL": "https://api.mistral.ai/v1",
  "MISTRAL_OCR_API_KEY": "your-key"
}

Vector Database Support

Open WebUI supports 9 vector database options:
Default embedded database
  • No external dependencies
  • Perfect for single-node deployments
  • Persistent storage

Embedding Configuration

Embedding Models

Configure embedding generation:
1

Choose Engine

{
  "RAG_EMBEDDING_ENGINE": "ollama",  // "", ollama, openai, azure_openai
  "RAG_EMBEDDING_MODEL": "nomic-embed-text"
}
2

Configure Provider

Ollama:
{
  "RAG_OLLAMA_BASE_URL": "http://ollama:11434",
  "RAG_OLLAMA_API_KEY": ""
}
OpenAI:
{
  "RAG_OPENAI_API_BASE_URL": "https://api.openai.com/v1",
  "RAG_OPENAI_API_KEY": "sk-..."
}
Azure OpenAI:
{
  "RAG_AZURE_OPENAI_BASE_URL": "https://your-instance.openai.azure.com",
  "RAG_AZURE_OPENAI_API_KEY": "your-key",
  "RAG_AZURE_OPENAI_API_VERSION": "2023-05-15"
}
3

Optimize Performance

{
  "RAG_EMBEDDING_BATCH_SIZE": 100,
  "ENABLE_ASYNC_EMBEDDING": true,
  "RAG_EMBEDDING_CONCURRENT_REQUESTS": 4
}
Changing embedding models requires re-embedding all existing documents. Plan migrations carefully.

Chunking Strategies

Optimize document chunking for better retrieval:

Text Splitters

RecursiveCharacterTextSplitter (Default)
  • Splits on multiple separators hierarchically
  • Preserves semantic meaning
  • Best for general content
{
  "TEXT_SPLITTER": "recursive",
  "CHUNK_SIZE": 1500,
  "CHUNK_OVERLAP": 200
}

Chunking Parameters

Optimal chunking balances:
  • Chunk Size: Larger = more context, fewer chunks
  • Overlap: Prevents information loss at boundaries
  • Min Size: Ensures chunks contain meaningful content
{
  "CHUNK_SIZE": 1500,              // Characters/tokens per chunk
  "CHUNK_MIN_SIZE_TARGET": 100,    // Minimum viable chunk
  "CHUNK_OVERLAP": 200             // Overlap between chunks
}
Combine vector and keyword search for better results.
{
  "ENABLE_RAG_HYBRID_SEARCH": true,
  "ENABLE_RAG_HYBRID_SEARCH_ENRICHED_TEXTS": true,
  "HYBRID_BM25_WEIGHT": 0.5  // 0=vector only, 1=BM25 only
}

How It Works

1

Vector Search

Semantic similarity using embeddings
2

BM25 Keyword Search

Traditional keyword matching with TF-IDF
3

Score Fusion

Combine scores using configured weight
4

Reranking

Optionally rerank results with dedicated model

Reranking

Improve retrieval quality with reranking models.

Configuration

{
  "RAG_RERANKING_ENGINE": "",
  "RAG_RERANKING_MODEL": "BAAI/bge-reranker-large",
  "TOP_K_RERANKER": 10
}
Supported models:
  • BAAI/bge-reranker-*
  • jinaai/jina-colbert-v2
  • CrossEncoder models

Reranking Process

  1. Initial Retrieval: Get top N candidates (e.g., 50)
  2. Rerank: Score candidates with reranking model
  3. Filter: Keep top K (e.g., 10) best matches
  4. Threshold: Optionally filter by relevance score
{
  "TOP_K": 50,                    // Initial retrieval
  "TOP_K_RERANKER": 10,           // After reranking
  "RELEVANCE_THRESHOLD": 0.0      // Minimum score (0-1)
}

Web Search Integration

Enhance RAG with live web search.

Supported Providers

SearXNG

Self-hosted metasearch engine

Google PSE

Programmable Search Engine

Brave Search

Privacy-focused search API

Kagi

Premium search API

Tavily

AI-optimized search

Perplexity

AI-powered answers

Configuration Example

{
  "ENABLE_WEB_SEARCH": true,
  "WEB_SEARCH_ENGINE": "searxng",
  "SEARXNG_QUERY_URL": "http://searxng:8080/search",
  "WEB_SEARCH_RESULT_COUNT": 5,
  "WEB_SEARCH_CONCURRENT_REQUESTS": 3
}

Web Search Workflow

1

Search Execution

Query configured search provider for relevant URLs
2

Content Loading

Fetch and extract text from web pages
3

Processing

Chunk and embed web content
4

RAG Integration

Combine with document library results

Using RAG in Chat

Accessing Documents

Reference documents in chat using the # command:
# Single document
#document-name What is the summary of this report?

# Multiple documents
#doc1 #doc2 #doc3 Compare these three documents

# Web URL
#https://example.com/article What does this article say about AI?

RAG Template

Customize how retrieved context is presented:
{
  "RAG_TEMPLATE": """Use the following context to answer the question:

Context:
{{CONTEXT}}

Question: {{QUERY}}"""
}
The {{CONTEXT}} placeholder is replaced with retrieved document chunks, and {{QUERY}} with the user’s question.

Advanced Features

Full Context Mode

Bypass chunking for small documents:
{
  "RAG_FULL_CONTEXT": true
}
When enabled:
  • Small documents sent in full
  • Better context preservation
  • Higher token usage

Bypass Embedding

Skip vector search for specific use cases:
{
  "BYPASS_EMBEDDING_AND_RETRIEVAL": true
}
This disables RAG functionality. Documents won’t be searchable.

YouTube Integration

Extract transcripts from YouTube videos:
{
  "YOUTUBE_LOADER_LANGUAGE": ["en", "es", "fr"],
  "YOUTUBE_LOADER_PROXY_URL": "http://proxy:8080",
  "YOUTUBE_LOADER_TRANSLATION": "en"
}
Usage:
#https://youtube.com/watch?v=VIDEO_ID Summarize this video

Cloud Storage Integration

Import documents from cloud services:
{
  "ENABLE_GOOGLE_DRIVE_INTEGRATION": true
}
Features:
  • OAuth authentication
  • File picker interface
  • Automatic download and processing

Performance Optimization

Async Embedding

Parallelize embedding generation:
{
  "ENABLE_ASYNC_EMBEDDING": true,
  "RAG_EMBEDDING_CONCURRENT_REQUESTS": 4,
  "RAG_EMBEDDING_BATCH_SIZE": 100
}
Benefits:
  • Faster document processing
  • Better resource utilization
  • Configurable concurrency

Web Loader Optimization

{
  "WEB_LOADER_CONCURRENT_REQUESTS": 5,
  "WEB_LOADER_TIMEOUT": "30",
  "ENABLE_WEB_LOADER_SSL_VERIFICATION": true
}

API Reference

RAG Configuration

# Get current settings
GET /api/v1/retrieval/config

# Update configuration
POST /api/v1/retrieval/config/update

Embedding Management

# Get embedding config
GET /api/v1/retrieval/embedding

# Update embedding model
POST /api/v1/retrieval/embedding/update
# Perform web search
POST /api/v1/retrieval/web/search
{
  "queries": ["search query 1", "search query 2"]
}

Best Practices

Choose Right Chunk Size

  • Too small: Loss of context
  • Too large: Poor retrieval precision
  • Start with 1500 characters
  • Adjust based on content type

Use Hybrid Search

  • Better than vector-only for many queries
  • Combines semantic + keyword matching
  • Tune weight based on use case

Enable Reranking

  • Significantly improves result quality
  • Small performance cost
  • Worth it for production use

Monitor Embedding Costs

  • Track API usage for cloud providers
  • Consider local models for volume
  • Batch processing reduces costs

Build docs developers (and LLMs) love