Overview
Retrieval-Augmented Generation (RAG) enables agents to access and reason over external documents. Qwen-Agent provides built-in RAG capabilities through the Assistant class, supporting multiple document formats including PDF, Word, PowerPoint, TXT, and HTML.
Quick Start
Here’s a simple RAG example (examples/assistant_rag.py:19):
from qwen_agent.agents import Assistant
bot = Assistant(
llm={'model': 'qwen-plus-latest'}
)
messages = [{
'role': 'user',
'content': [
{'text': 'What does this document say about transformers?'},
{'file': 'https://arxiv.org/pdf/1706.03762.pdf'}
]
}]
for response in bot.run(messages):
print(response)
How RAG Works in Qwen-Agent
The RAG workflow in the Assistant class (qwen_agent/agents/assistant.py:100):
1. User Query + Documents
↓
2. Document Parsing & Chunking
↓
3. Retrieval (find relevant chunks)
↓
4. Knowledge Formatting
↓
5. LLM Generation with Context
↓
6. Response
Basic Usage
Single Document Q&A
from qwen_agent.agents import Assistant
bot = Assistant(llm={'model': 'qwen-max'})
# Query with PDF
messages = [{
'role': 'user',
'content': [
{'text': 'Summarize the key points'},
{'file': 'document.pdf'}
]
}]
for response in bot.run(messages):
print(response)
Multiple Document Q&A
messages = [{
'role': 'user',
'content': [
{'text': 'Compare the findings in these two papers'},
{'file': 'paper1.pdf'},
{'file': 'paper2.pdf'}
]
}]
for response in bot.run(messages):
print(response)
Pre-loaded Knowledge Base
Load documents when initializing the agent:
bot = Assistant(
llm={'model': 'qwen-max'},
files=[
'docs/manual.pdf',
'docs/faq.html',
'docs/guide.txt'
]
)
# Documents are available for all queries
messages = [{'role': 'user', 'content': 'What does the manual say about installation?'}]
for response in bot.run(messages):
print(response)
The Assistant automatically handles multiple formats:
| Format | Extension | Notes |
|---|
| PDF | .pdf | Text extraction with layout preservation |
| Word | .docx, .doc | Full text and table extraction |
| PowerPoint | .pptx, .ppt | Slide text extraction |
| HTML | .html, .htm | Web page content |
| Text | .txt, .md | Plain text documents |
Remote Files
Support for URLs:
messages = [{
'role': 'user',
'content': [
{'text': 'Summarize this paper'},
{'file': 'https://arxiv.org/pdf/2307.09288.pdf'} # Remote URL
]
}]
The Assistant formats retrieved knowledge into structured prompts (qwen_agent/agents/assistant.py:27):
# Knowledge Base
## The content from [document.pdf]:
[Retrieved content chunk 1]
## The content from [paper.pdf]:
[Retrieved content chunk 2]
# 知识库
## 来自 [document.pdf] 的内容:
[检索到的内容片段 1]
The knowledge is automatically formatted and prepended to the system message before LLM generation. The language format is auto-detected.
Advanced Configuration
RAG Configuration Options
Customize RAG behavior with rag_cfg parameter:
bot = Assistant(
llm={'model': 'qwen-max'},
files=['knowledge_base.pdf'],
rag_cfg={
'chunk_size': 500, # Characters per chunk
'chunk_overlap': 50, # Overlap between chunks
'top_k': 5, # Number of chunks to retrieve
'similarity_threshold': 0.7 # Minimum similarity score
}
)
Custom Knowledge Injection
Manually provide knowledge instead of automatic retrieval (qwen_agent/agents/assistant.py:103):
bot = Assistant(llm={'model': 'qwen-max'})
# Provide pre-retrieved knowledge
external_knowledge = """
Key findings from research:
1. Model architecture uses attention mechanisms
2. Training requires large-scale datasets
3. Performance scales with model size
"""
messages = [{'role': 'user', 'content': 'Explain the key findings'}]
for response in bot.run(messages, knowledge=external_knowledge):
print(response)
Document Parsing Details
The document parser extracts text and maintains structure:
Parser Features
- Layout Preservation: Maintains document structure
- Table Extraction: Converts tables to text
- Image Handling: Can reference image locations
- Metadata: Extracts page numbers and sections
Chunking Strategy
Documents are split into overlapping chunks:
# Example chunking (simplified)
Document: "This is a long document about AI..."
Chunk 1: "This is a long document about AI..."
Chunk 2: "...about AI and machine learning..."
Chunk 3: "...machine learning applications..."
# Overlap ensures context continuity
Retrieval Strategies
Default: Semantic Search
The Assistant uses embedding-based semantic search:
# Automatic semantic retrieval
bot = Assistant(
llm={'model': 'qwen-max'},
files=['database.pdf']
)
messages = [{
'role': 'user',
'content': 'What are the best practices?' # Finds semantically similar content
}]
Parallel Document QA
For processing multiple documents in parallel:
from qwen_agent.agents import ParallelDocQA
bot = ParallelDocQA(
llm={'model': 'qwen-max'}
)
messages = [{
'role': 'user',
'content': [
{'text': 'Compare architectures'},
{'file': 'paper1.pdf'},
{'file': 'paper2.pdf'},
{'file': 'paper3.pdf'}
]
}]
# Processes each document in parallel for faster results
for response in bot.run(messages):
print(response)
Complete Example: Document Assistant
Here’s a complete RAG application with GUI (examples/assistant_rag.py):
from qwen_agent.agents import Assistant
from qwen_agent.gui import WebUI
def app_gui():
# Configure the assistant
bot = Assistant(
llm={'model': 'qwen-plus-latest'},
name='Assistant',
description='Use RAG to answer questions. Supports: PDF/Word/PPT/TXT/HTML.'
)
# Configure UI with suggestions
chatbot_config = {
'prompt.suggestions': [
{'text': 'Introduce figure one'},
{'text': 'What does the second chapter say?'},
]
}
WebUI(bot, chatbot_config=chatbot_config).run()
if __name__ == '__main__':
app_gui()
Combine RAG with code execution and other tools:
from qwen_agent.agents import Assistant
bot = Assistant(
llm={'model': 'qwen-max'},
function_list=['code_interpreter'], # Add code execution
files=['data_analysis_guide.pdf'] # Knowledge base
)
messages = [{
'role': 'user',
'content': [
{'text': 'Based on the guide, analyze this CSV file'},
{'file': 'data.csv'}
]
}]
# Agent can:
# 1. Retrieve relevant info from the guide
# 2. Execute code to analyze the CSV
# 3. Combine knowledge and analysis in response
for response in bot.run(messages):
print(response)
Multi-Turn Conversations
RAG works across conversation turns:
bot = Assistant(
llm={'model': 'qwen-max'},
files=['technical_manual.pdf']
)
messages = []
# Turn 1
messages.append({'role': 'user', 'content': 'What is the installation process?'})
for response in bot.run(messages):
print(response)
messages.extend(response)
# Turn 2 - continues with same knowledge
messages.append({'role': 'user', 'content': 'What about troubleshooting?'})
for response in bot.run(messages):
print(response)
messages.extend(response)
# Turn 3
messages.append({'role': 'user', 'content': 'Can you show me an example?'})
for response in bot.run(messages):
print(response)
Best Practices
RAG Optimization Tips
- Use specific queries for better retrieval accuracy
- Pre-load frequently accessed documents via
files parameter
- Adjust
chunk_size based on document structure (larger for technical docs)
- Use
top_k to control context length vs. relevance trade-off
- Combine with tools for enhanced capabilities (code, search, etc.)
Common Pitfalls
- Very large documents may require chunking optimization
- Complex layouts (multi-column PDFs) may affect extraction quality
- Ensure document URLs are accessible if using remote files
- Retrieved context counts toward LLM token limits
Troubleshooting
Document Parsing Errors
# Handle parsing errors gracefully
try:
messages = [{'role': 'user', 'content': [{'file': 'corrupted.pdf'}]}]
for response in bot.run(messages):
print(response)
except Exception as e:
print(f"Document parsing failed: {e}")
No Relevant Content Retrieved
# Adjust retrieval parameters
bot = Assistant(
llm={'model': 'qwen-max'},
files=['docs.pdf'],
rag_cfg={
'top_k': 10, # Retrieve more chunks
'similarity_threshold': 0.5 # Lower threshold
}
)
# For large document collections, use smaller chunks
bot = Assistant(
llm={'model': 'qwen-max'},
files=['large_doc.pdf'],
rag_cfg={
'chunk_size': 300, # Smaller chunks
'top_k': 3 # Fewer retrieved chunks
}
)
Next Steps