Skip to main content

Overview

Retrieval is a RAG (Retrieval-Augmented Generation) tool that retrieves relevant content from documents based on queries.

Class Signature

from qwen_agent.tools import Retrieval

class Retrieval(BaseTool):
    name = 'retrieval'
    description = 'Retrieve relevant content from documents'
    parameters = {
        'type': 'object',
        'properties': {
            'query': {
                'description': 'Keywords for matching content',
                'type': 'string',
            },
            'files': {
                'description': 'File paths or URLs',
                'type': 'array',
                'items': {'type': 'string'}
            }
        },
        'required': ['query', 'files'],
    }

Parameters

query
str
required
Search query with keywords
files
List[str]
required
Document files to search (URLs or local paths)
max_ref_token
int
default:"4000"
Maximum tokens to retrieve
parser_page_size
int
default:"500"
Chunk size for document parsing
rag_searchers
List[str]
Search strategies: [‘keyword_search’, ‘vector_search’, ‘hybrid_search’]

Usage Example

Basic Retrieval

from qwen_agent.tools import Retrieval

tool = Retrieval()

result = tool.call({
    'query': 'installation requirements',
    'files': ['user_manual.pdf', 'https://docs.example.com/guide.html']
})

print(result)
# Returns: [{'url': 'user_manual.pdf', 'text': ['Chapter 2: Requirements...']}]

Custom Configuration

tool = Retrieval({
    'max_ref_token': 8000,
    'parser_page_size': 300,
    'rag_searchers': ['hybrid_search']
})

result = tool.call({
    'query': 'API authentication methods',
    'files': ['api_docs.pdf']
})

With Agent

from qwen_agent.agents import FnCallAgent

agent = FnCallAgent(
    function_list=[{
        'name': 'retrieval',
        'max_ref_token': 6000
    }],
    llm={'model': 'qwen-max'}
)

messages = [{
    'role': 'user',
    'content': 'What does the manual say about troubleshooting?',
    'files': ['manual.pdf']
}]

for response in agent.run(messages):
    print(response[-1].content)

Supported File Types

  • PDF (.pdf)
  • Word (.docx)
  • PowerPoint (.pptx)
  • Excel (.xlsx)
  • Text (.txt)
  • HTML (URLs)

Search Strategies

tool = Retrieval({'rag_searchers': ['keyword_search']})
# BM25-based keyword matching
tool = Retrieval({'rag_searchers': ['vector_search']})
# Semantic similarity search
tool = Retrieval({'rag_searchers': ['hybrid_search']})
# Combines keyword + vector search

See Also

Build docs developers (and LLMs) love