Overview
DocParser extracts content from documents and splits them into manageable chunks for processing.
Class Signature
from qwen_agent.tools import DocParser
class DocParser(BaseTool):
name = 'doc_parser'
description = 'Extract and chunk document content'
parameters = {
'type': 'object',
'properties': {
'url': {
'description': 'File path or downloadable URL',
'type': 'string',
}
},
'required': ['url'],
}
Parameters
Document path (local or URL)
Target chunk size in tokens
Maximum tokens before chunking (if below, returns whole document)
Usage Example
from qwen_agent.tools import DocParser
tool = DocParser()
result = tool.call({
'url': 'research_paper.pdf'
}, parser_page_size=1000)
print(result)
# Returns:
# {
# 'url': 'research_paper.pdf',
# 'title': 'Research Paper Title',
# 'raw': [
# {'content': 'Chunk 1...', 'token': 950, 'metadata': {...}},
# {'content': 'Chunk 2...', 'token': 1000, 'metadata': {...}}
# ]
# }
{
'url': 'document.pdf',
'title': 'Document Title',
'raw': [
{
'content': 'Text content...',
'token': 500,
'metadata': {
'source': 'document.pdf',
'title': 'Document Title',
'chunk_id': 0
}
},
...
]
}
See Also