RAG Integration - Open WebUI

Overview

Open WebUI’s RAG (Retrieval-Augmented Generation) system enables powerful document-based chat interactions by combining vector search, hybrid retrieval, and multiple content extraction engines.

Document Upload & Processing

Supported File Types

Open WebUI supports extensive file format compatibility:

Documents
Code & Data
Images
Web Content

PDF (with OCR support)
Word (DOC, DOCX)
PowerPoint (PPT, PPTX)
Excel (XLS, XLSX)
Plain text (TXT, MD)
Rich text (RTF)

Content Extraction Engines

Choose from multiple extraction engines based on your needs:

Tika

Apache Tika - Universal document parser

Supports 1000+ file formats
Metadata extraction
Self-hosted option

Docling

IBM Docling - AI-powered extraction

Advanced layout understanding
Table structure preservation
High accuracy for complex documents

Document Intelligence

Azure Document Intelligence

Cloud-based OCR
Form recognition
Layout analysis
Custom model support

Mistral OCR

Mistral OCR API

AI-powered image text extraction
Multi-language support
High-quality results

Configuration

Configure content extraction settings:

# From routers/retrieval.py:481-511
{
  "CONTENT_EXTRACTION_ENGINE": "tika",  // tika, docling, azure, mistral
  "PDF_EXTRACT_IMAGES": true,
  "PDF_LOADER_MODE": "auto",  // auto, fast, quality
  
  // Tika settings
  "TIKA_SERVER_URL": "http://tika:9998",
  
  // Docling settings
  "DOCLING_SERVER_URL": "http://docling:5000",
  "DOCLING_API_KEY": "your-api-key",
  "DOCLING_PARAMS": {...},
  
  // Azure Document Intelligence
  "DOCUMENT_INTELLIGENCE_ENDPOINT": "https://your-instance.cognitiveservices.azure.com",
  "DOCUMENT_INTELLIGENCE_KEY": "your-key",
  "DOCUMENT_INTELLIGENCE_MODEL": "prebuilt-read",
  
  // Mistral OCR
  "MISTRAL_OCR_API_BASE_URL": "https://api.mistral.ai/v1",
  "MISTRAL_OCR_API_KEY": "your-key"
}

Vector Database Support

Open WebUI supports 9 vector database options:

ChromaDB
PostgreSQL
Qdrant
Cloud Options

Default embedded database

No external dependencies
Perfect for single-node deployments
Persistent storage

Embedding Configuration

Embedding Models

Configure embedding generation:

Choose Engine

{
  "RAG_EMBEDDING_ENGINE": "ollama",  // "", ollama, openai, azure_openai
  "RAG_EMBEDDING_MODEL": "nomic-embed-text"
}

Configure Provider

Ollama:

{
  "RAG_OLLAMA_BASE_URL": "http://ollama:11434",
  "RAG_OLLAMA_API_KEY": ""
}

OpenAI:

{
  "RAG_OPENAI_API_BASE_URL": "https://api.openai.com/v1",
  "RAG_OPENAI_API_KEY": "sk-..."
}

Azure OpenAI:

{
  "RAG_AZURE_OPENAI_BASE_URL": "https://your-instance.openai.azure.com",
  "RAG_AZURE_OPENAI_API_KEY": "your-key",
  "RAG_AZURE_OPENAI_API_VERSION": "2023-05-15"
}

Optimize Performance

{
  "RAG_EMBEDDING_BATCH_SIZE": 100,
  "ENABLE_ASYNC_EMBEDDING": true,
  "RAG_EMBEDDING_CONCURRENT_REQUESTS": 4
}

Changing embedding models requires re-embedding all existing documents. Plan migrations carefully.

Chunking Strategies

Optimize document chunking for better retrieval:

Text Splitters

Recursive
Token-Based
Markdown

RecursiveCharacterTextSplitter (Default)

Splits on multiple separators hierarchically
Preserves semantic meaning
Best for general content

{
  "TEXT_SPLITTER": "recursive",
  "CHUNK_SIZE": 1500,
  "CHUNK_OVERLAP": 200
}

TokenTextSplitter

Splits by token count
Precise size control
Better for LLM context limits

{
  "TEXT_SPLITTER": "token",
  "CHUNK_SIZE": 1000,  // tokens
  "CHUNK_OVERLAP": 100
}

MarkdownHeaderTextSplitter

Splits by markdown structure
Preserves headers
Maintains hierarchy

{
  "ENABLE_MARKDOWN_HEADER_TEXT_SPLITTER": true,
  "TEXT_SPLITTER": "markdown"
}

Chunking Parameters

Optimal chunking balances:

Chunk Size: Larger = more context, fewer chunks
Overlap: Prevents information loss at boundaries
Min Size: Ensures chunks contain meaningful content

{
  "CHUNK_SIZE": 1500,              // Characters/tokens per chunk
  "CHUNK_MIN_SIZE_TARGET": 100,    // Minimum viable chunk
  "CHUNK_OVERLAP": 200             // Overlap between chunks
}

Hybrid Search

Combine vector and keyword search for better results.

Enabling Hybrid Search

{
  "ENABLE_RAG_HYBRID_SEARCH": true,
  "ENABLE_RAG_HYBRID_SEARCH_ENRICHED_TEXTS": true,
  "HYBRID_BM25_WEIGHT": 0.5  // 0=vector only, 1=BM25 only
}

How It Works

Vector Search

Semantic similarity using embeddings

BM25 Keyword Search

Traditional keyword matching with TF-IDF

Score Fusion

Combine scores using configured weight

Reranking

Optionally rerank results with dedicated model

Reranking

Improve retrieval quality with reranking models.

Configuration

Local Models
External API

{
  "RAG_RERANKING_ENGINE": "",
  "RAG_RERANKING_MODEL": "BAAI/bge-reranker-large",
  "TOP_K_RERANKER": 10
}

Supported models:

BAAI/bge-reranker-*
jinaai/jina-colbert-v2
CrossEncoder models

{
  "RAG_RERANKING_ENGINE": "external",
  "RAG_RERANKING_MODEL": "your-model",
  "RAG_EXTERNAL_RERANKER_URL": "http://reranker:8000",
  "RAG_EXTERNAL_RERANKER_API_KEY": "key",
  "RAG_EXTERNAL_RERANKER_TIMEOUT": "30"
}

Reranking Process

Initial Retrieval: Get top N candidates (e.g., 50)
Rerank: Score candidates with reranking model
Filter: Keep top K (e.g., 10) best matches
Threshold: Optionally filter by relevance score

{
  "TOP_K": 50,                    // Initial retrieval
  "TOP_K_RERANKER": 10,           // After reranking
  "RELEVANCE_THRESHOLD": 0.0      // Minimum score (0-1)
}

Web Search Integration

Enhance RAG with live web search.

Supported Providers

SearXNG

Self-hosted metasearch engine

Google PSE

Programmable Search Engine

Brave Search

Privacy-focused search API

Kagi

Premium search API

Tavily

AI-optimized search

Perplexity

AI-powered answers

Configuration Example

{
  "ENABLE_WEB_SEARCH": true,
  "WEB_SEARCH_ENGINE": "searxng",
  "SEARXNG_QUERY_URL": "http://searxng:8080/search",
  "WEB_SEARCH_RESULT_COUNT": 5,
  "WEB_SEARCH_CONCURRENT_REQUESTS": 3
}

Web Search Workflow

Search Execution

Query configured search provider for relevant URLs

Content Loading

Fetch and extract text from web pages

Processing

Chunk and embed web content

RAG Integration

Combine with document library results

Using RAG in Chat

Accessing Documents

Reference documents in chat using the # command:

# Single document
#document-name What is the summary of this report?

# Multiple documents
#doc1 #doc2 #doc3 Compare these three documents

# Web URL
#https://example.com/article What does this article say about AI?

RAG Template

Customize how retrieved context is presented:

{
  "RAG_TEMPLATE": """Use the following context to answer the question:

Context:
{{CONTEXT}}

Question: {{QUERY}}"""
}

The {{CONTEXT}} placeholder is replaced with retrieved document chunks, and {{QUERY}} with the user’s question.

Advanced Features

Full Context Mode

Bypass chunking for small documents:

{
  "RAG_FULL_CONTEXT": true
}

When enabled:

Small documents sent in full
Better context preservation
Higher token usage

Bypass Embedding

Skip vector search for specific use cases:

{
  "BYPASS_EMBEDDING_AND_RETRIEVAL": true
}

This disables RAG functionality. Documents won’t be searchable.

YouTube Integration

Extract transcripts from YouTube videos:

{
  "YOUTUBE_LOADER_LANGUAGE": ["en", "es", "fr"],
  "YOUTUBE_LOADER_PROXY_URL": "http://proxy:8080",
  "YOUTUBE_LOADER_TRANSLATION": "en"
}

Usage:

#https://youtube.com/watch?v=VIDEO_ID Summarize this video

Cloud Storage Integration

Import documents from cloud services:

Google Drive
OneDrive

{
  "ENABLE_GOOGLE_DRIVE_INTEGRATION": true
}

Features:

OAuth authentication
File picker interface
Automatic download and processing

{
  "ENABLE_ONEDRIVE_INTEGRATION": true
}

Features:

SharePoint support
Microsoft Graph API
Enterprise integration

Performance Optimization

Async Embedding

Parallelize embedding generation:

{
  "ENABLE_ASYNC_EMBEDDING": true,
  "RAG_EMBEDDING_CONCURRENT_REQUESTS": 4,
  "RAG_EMBEDDING_BATCH_SIZE": 100
}

Benefits:

Faster document processing
Better resource utilization
Configurable concurrency

Web Loader Optimization

{
  "WEB_LOADER_CONCURRENT_REQUESTS": 5,
  "WEB_LOADER_TIMEOUT": "30",
  "ENABLE_WEB_LOADER_SSL_VERIFICATION": true
}

API Reference

RAG Configuration

# Get current settings
GET /api/v1/retrieval/config

# Update configuration
POST /api/v1/retrieval/config/update

Embedding Management

# Get embedding config
GET /api/v1/retrieval/embedding

# Update embedding model
POST /api/v1/retrieval/embedding/update

Web Search

# Perform web search
POST /api/v1/retrieval/web/search
{
  "queries": ["search query 1", "search query 2"]
}

Best Practices

Choose Right Chunk Size

Too small: Loss of context
Too large: Poor retrieval precision
Start with 1500 characters
Adjust based on content type

Use Hybrid Search

Better than vector-only for many queries
Combines semantic + keyword matching
Tune weight based on use case

Enable Reranking

Significantly improves result quality
Small performance cost
Worth it for production use

Monitor Embedding Costs

Track API usage for cloud providers
Consider local models for volume
Batch processing reduces costs

Getting Started

Core Features

Advanced Features

Integrations

Enterprise

Deployment

Development

Help

​Overview

​Document Upload & Processing

​Supported File Types

​Content Extraction Engines

Tika

Docling

Document Intelligence

Mistral OCR

​Configuration

​Vector Database Support

​Embedding Configuration

​Embedding Models

​Chunking Strategies

​Text Splitters

​Chunking Parameters

​Hybrid Search

​Enabling Hybrid Search

​How It Works

​Reranking

​Configuration

​Reranking Process

​Web Search Integration

​Supported Providers

SearXNG

Google PSE

Brave Search

Kagi

Tavily

Perplexity

​Configuration Example

​Web Search Workflow

​Using RAG in Chat

​Accessing Documents

​RAG Template

​Advanced Features

​Full Context Mode

​Bypass Embedding

​YouTube Integration

​Cloud Storage Integration

​Performance Optimization

​Async Embedding

​Web Loader Optimization

​API Reference

​RAG Configuration

​Embedding Management

​Web Search

​Best Practices

Choose Right Chunk Size

Use Hybrid Search

Enable Reranking

Monitor Embedding Costs

Build docs developers (and LLMs) love

Overview

Document Upload & Processing

Supported File Types

Content Extraction Engines

Configuration

Vector Database Support

Embedding Configuration

Embedding Models

Chunking Strategies

Text Splitters

Chunking Parameters

Hybrid Search

Enabling Hybrid Search

How It Works

Reranking

Configuration

Reranking Process

Web Search Integration

Supported Providers

Configuration Example

Web Search Workflow

Using RAG in Chat

Accessing Documents

RAG Template

Advanced Features

Full Context Mode

Bypass Embedding

YouTube Integration

Cloud Storage Integration

Performance Optimization

Async Embedding

Web Loader Optimization

API Reference

RAG Configuration

Embedding Management

Web Search

Best Practices