Overview
Open WebUI’s RAG (Retrieval-Augmented Generation) system enables powerful document-based chat interactions by combining vector search, hybrid retrieval, and multiple content extraction engines.Document Upload & Processing
Supported File Types
Open WebUI supports extensive file format compatibility:- Documents
- Code & Data
- Images
- Web Content
- PDF (with OCR support)
- Word (DOC, DOCX)
- PowerPoint (PPT, PPTX)
- Excel (XLS, XLSX)
- Plain text (TXT, MD)
- Rich text (RTF)
Content Extraction Engines
Choose from multiple extraction engines based on your needs:Tika
Apache Tika - Universal document parser
- Supports 1000+ file formats
- Metadata extraction
- Self-hosted option
Docling
IBM Docling - AI-powered extraction
- Advanced layout understanding
- Table structure preservation
- High accuracy for complex documents
Document Intelligence
Azure Document Intelligence
- Cloud-based OCR
- Form recognition
- Layout analysis
- Custom model support
Mistral OCR
Mistral OCR API
- AI-powered image text extraction
- Multi-language support
- High-quality results
Configuration
Configure content extraction settings:Vector Database Support
Open WebUI supports 9 vector database options:- ChromaDB
- PostgreSQL
- Qdrant
- Cloud Options
Default embedded database
- No external dependencies
- Perfect for single-node deployments
- Persistent storage
Embedding Configuration
Embedding Models
Configure embedding generation:Chunking Strategies
Optimize document chunking for better retrieval:Text Splitters
- Recursive
- Token-Based
- Markdown
RecursiveCharacterTextSplitter (Default)
- Splits on multiple separators hierarchically
- Preserves semantic meaning
- Best for general content
Chunking Parameters
Optimal chunking balances:
- Chunk Size: Larger = more context, fewer chunks
- Overlap: Prevents information loss at boundaries
- Min Size: Ensures chunks contain meaningful content
Hybrid Search
Combine vector and keyword search for better results.Enabling Hybrid Search
How It Works
Reranking
Improve retrieval quality with reranking models.Configuration
- Local Models
- External API
- BAAI/bge-reranker-*
- jinaai/jina-colbert-v2
- CrossEncoder models
Reranking Process
- Initial Retrieval: Get top N candidates (e.g., 50)
- Rerank: Score candidates with reranking model
- Filter: Keep top K (e.g., 10) best matches
- Threshold: Optionally filter by relevance score
Web Search Integration
Enhance RAG with live web search.Supported Providers
SearXNG
Self-hosted metasearch engine
Google PSE
Programmable Search Engine
Brave Search
Privacy-focused search API
Kagi
Premium search API
Tavily
AI-optimized search
Perplexity
AI-powered answers
Configuration Example
Web Search Workflow
Using RAG in Chat
Accessing Documents
Reference documents in chat using the# command:
RAG Template
Customize how retrieved context is presented:Advanced Features
Full Context Mode
Bypass chunking for small documents:- Small documents sent in full
- Better context preservation
- Higher token usage
Bypass Embedding
Skip vector search for specific use cases:YouTube Integration
Extract transcripts from YouTube videos:Cloud Storage Integration
Import documents from cloud services:- Google Drive
- OneDrive
- OAuth authentication
- File picker interface
- Automatic download and processing
Performance Optimization
Async Embedding
Parallelize embedding generation:- Faster document processing
- Better resource utilization
- Configurable concurrency
Web Loader Optimization
API Reference
RAG Configuration
Embedding Management
Web Search
Best Practices
Choose Right Chunk Size
- Too small: Loss of context
- Too large: Poor retrieval precision
- Start with 1500 characters
- Adjust based on content type
Use Hybrid Search
- Better than vector-only for many queries
- Combines semantic + keyword matching
- Tune weight based on use case
Enable Reranking
- Significantly improves result quality
- Small performance cost
- Worth it for production use
Monitor Embedding Costs
- Track API usage for cloud providers
- Consider local models for volume
- Batch processing reduces costs