Long Document QA - Qwen-Agent

Qwen-Agent provides specialized agents for handling long documents like research papers, books, and technical documentation. These agents use parallel processing and retrieval-augmented generation (RAG) for efficient analysis.

ParallelDocQA Agent

The ParallelDocQA agent processes documents in parallel for faster question answering.

Basic Usage

parallel_doc_qa.py

from qwen_agent.agents.doc_qa import ParallelDocQA
from qwen_agent.gui import WebUI

def test():
    bot = ParallelDocQA(
        llm={
            'model': 'qwen2.5-72b-instruct',
            'generate_cfg': {
                'max_retries': 10
            }
        }
    )
    
    messages = [
        {
            'role': 'user',
            'content': [
                {'text': '介绍实验方法'},
                {'file': 'https://arxiv.org/pdf/2310.08560.pdf'}
            ]
        },
    ]
    
    for rsp in bot.run(messages):
        print('bot response:', rsp)

if __name__ == '__main__':
    test()

How It Works

Document Chunking

The document is split into manageable chunks (pages or sections)

Parallel Processing

Multiple chunks are processed simultaneously using parallel queries

RAG Retrieval

Relevant chunks are retrieved based on the question

Answer Synthesis

Retrieved content is combined to generate a comprehensive answer

Performance Characteristics

Speed

2-3x faster than sequential processing for long documents

Accuracy

High accuracy through comprehensive chunk analysis

Memory

Efficient memory usage with streaming

Scalability

Handles documents of 100+ pages

Web UI Application

Create a user-friendly interface for document QA:

from qwen_agent.agents.doc_qa import ParallelDocQA
from qwen_agent.gui import WebUI

def app_gui():
    # Define the agent
    bot = ParallelDocQA(
        llm={
            'model': 'qwen2.5-72b-instruct',
            'generate_cfg': {
                'max_retries': 10
            }
        },
        description='并行QA后用RAG召回内容并回答。支持文件类型：PDF/Word/PPT/TXT/HTML。使用与材料相同的语言提问会更好。',
    )

    chatbot_config = {
        'prompt.suggestions': [
            {'text': '介绍实验方法'},
            {'text': '这篇论文的主要贡献是什么？'},
            {'text': '总结第三章的内容'},
        ]
    }

    WebUI(bot, chatbot_config=chatbot_config).run()

if __name__ == '__main__':
    app_gui()

Features

Drag-and-drop file upload
Multi-format support: PDF, Word, PowerPoint, TXT, HTML
Streaming responses: See answers as they’re generated
Multi-turn conversations: Build on previous questions
Suggested prompts: Guide users with example questions

Supported Document Formats

PDF
Word Documents
PowerPoint
Text & HTML

Full Support

Text extraction from all pages
Handles multi-column layouts
Preserves document structure
Supports both scanned and native PDFs (with OCR)

messages = [{
    'role': 'user',
    'content': [
        {'text': 'What is the main conclusion?'},
        {'file': 'paper.pdf'}
    ]
}]

Full Support

.doc and .docx formats
Preserves formatting and structure
Handles tables and lists

messages = [{
    'role': 'user',
    'content': [
        {'text': 'Summarize the key points'},
        {'file': 'report.docx'}
    ]
}]

Full Support

.ppt and .pptx formats
Extracts text from slides
Preserves slide order

messages = [{
    'role': 'user',
    'content': [
        {'text': 'What is covered in the presentation?'},
        {'file': 'slides.pptx'}
    ]
}]

Full Support

Plain text files (.txt)
HTML documents (.html, .htm)
Markdown files (.md)

messages = [{
    'role': 'user',
    'content': [
        {'text': 'Extract the main ideas'},
        {'file': 'document.txt'}
    ]
}]

Remote Files

Load documents from URLs:

messages = [{
    'role': 'user',
    'content': [
        {'text': 'Explain the transformer architecture'},
        {'file': 'https://arxiv.org/pdf/1706.03762.pdf'}
    ]
}]

for response in bot.run(messages):
    print(response)

The agent automatically downloads and processes remote files. Ensure URLs are publicly accessible.

Configuration Options

LLM Configuration

bot = ParallelDocQA(
    llm={
        'model': 'qwen2.5-72b-instruct',
        'model_server': 'dashscope',  # or custom URL
        'api_key': os.getenv('DASHSCOPE_API_KEY'),
        'generate_cfg': {
            'max_retries': 10,
            'temperature': 0.1,  # Lower for more factual responses
            'top_p': 0.9,
        }
    }
)

Agent Configuration

bot = ParallelDocQA(
    llm=llm_cfg,
    name='Document Assistant',
    description='Expert at analyzing long documents',
    
    # RAG Configuration
    rag_cfg={
        'chunk_size': 800,  # Characters per chunk
        'chunk_overlap': 100,  # Overlap between chunks
        'top_k': 5,  # Number of chunks to retrieve
    },
    
    # Parallel processing
    max_workers=4,  # Number of parallel workers
)

Advanced Usage Patterns

Multi-Document Analysis

Analyze multiple documents simultaneously:

messages = [{
    'role': 'user',
    'content': [
        {'text': 'Compare the methodologies in these papers'},
        {'file': 'paper1.pdf'},
        {'file': 'paper2.pdf'},
        {'file': 'paper3.pdf'},
    ]
}]

for response in bot.run(messages):
    print(response)

Multi-Turn Conversations

Build on previous questions:

messages = []

# First question
messages.append({
    'role': 'user',
    'content': [
        {'text': 'What is the main contribution?'},
        {'file': 'paper.pdf'}
    ]
})

for response in bot.run(messages):
    print(response)
messages.extend(response)

# Follow-up question
messages.append({
    'role': 'user',
    'content': 'How does this compare to prior work?'
})

for response in bot.run(messages):
    print(response)
messages.extend(response)

# Another follow-up
messages.append({
    'role': 'user',
    'content': 'What are the limitations?'
})

for response in bot.run(messages):
    print(response)

Specific Section Queries

Ask about specific parts:

queries = [
    'Summarize the introduction',
    'What methodology is used?',
    'What are the key results?',
    'What are the conclusions?',
]

for query in queries:
    messages = [{
        'role': 'user',
        'content': [
            {'text': query},
            {'file': 'paper.pdf'}
        ]
    }]
    
    print(f"\n### {query}")
    for response in bot.run(messages):
        print(response[-1]['content'])

Virtual Memory QA

For extremely large documents, use VirtualMemoryQA:

from qwen_agent.agents import VirtualMemoryQA

bot = VirtualMemoryQA(
    llm={'model': 'qwen2.5-72b-instruct'},
    memory_limit=100000,  # Tokens to keep in memory
)

messages = [{
    'role': 'user',
    'content': [
        {'text': 'Summarize this entire book'},
        {'file': 'large_book.pdf'}  # 500+ pages
    ]
}]

for response in bot.run(messages):
    print(response)

Virtual Memory Features

Unlimited Size

Handle documents of any size

Smart Caching

Keep most relevant content in memory

Context Management

Automatically manage token limits

Incremental Loading

Load content as needed

Best Practices

Query Optimization

Be Specific

Instead of “Tell me about this paper”, ask “What methodology does this paper use to evaluate performance?”

Use Document Language

Ask questions in the same language as the document for best results.

Break Complex Questions

Split complex questions into multiple simpler queries for better accuracy.

Reference Sections

Mention specific sections or chapters when relevant: “What does Chapter 3 say about…?”

Performance Optimization

# For faster responses on shorter documents
bot = ParallelDocQA(
    llm={
        'model': 'qwen-turbo',  # Faster model
        'generate_cfg': {
            'temperature': 0.1,
        }
    },
    rag_cfg={
        'chunk_size': 500,  # Smaller chunks
        'top_k': 3,  # Fewer chunks
    },
    max_workers=8,  # More parallelism
)

# For better accuracy on complex documents
bot = ParallelDocQA(
    llm={
        'model': 'qwen-max',  # More capable model
        'generate_cfg': {
            'temperature': 0.0,  # More deterministic
        }
    },
    rag_cfg={
        'chunk_size': 1000,  # Larger context
        'top_k': 7,  # More chunks
    },
)

Common Use Cases

Academic Research

# Analyze research papers
bot = ParallelDocQA(llm={'model': 'qwen-max'})

questions = [
    'What is the research question?',
    'What methodology is used?',
    'What are the main findings?',
    'What are the limitations?',
    'What future work is suggested?',
]

for question in questions:
    messages = [{'role': 'user', 'content': [
        {'text': question},
        {'file': 'research_paper.pdf'}
    ]}]
    for response in bot.run(messages):
        print(f"{question}: {response[-1]['content']}")

Legal Document Analysis

# Review contracts and legal documents
bot = ParallelDocQA(
    llm={'model': 'qwen-max'},
    rag_cfg={'chunk_size': 1200}  # Larger chunks for legal text
)

messages = [{
    'role': 'user',
    'content': [
        {'text': 'Identify key obligations and deadlines'},
        {'file': 'contract.pdf'}
    ]
}]

Technical Documentation

# Navigate technical manuals
bot = ParallelDocQA(llm={'model': 'qwen-max'})

messages = [{
    'role': 'user',
    'content': [
        {'text': 'How do I configure SSL certificates?'},
        {'file': 'technical_manual.pdf'}
    ]
}]

Troubleshooting

Slow processing

Reduce max_workers if API rate limits are hit
Use a faster model like qwen-turbo
Reduce chunk_size and top_k
Ensure good network connection for remote files

Inaccurate answers

Increase top_k to retrieve more chunks
Use a more capable model like qwen-max
Lower temperature to 0.0 for more factual responses
Try rephrasing your question more specifically

File loading errors

Verify file format is supported
Check file is not corrupted
Ensure remote URLs are accessible
Try converting to a different format

Out of memory

Use VirtualMemoryQA for very large documents
Reduce chunk_size
Process documents in sections

Limitations

Be aware of these limitations:

Scanned PDFs require OCR which may have errors
Very large files (>100MB) may be slow to process
Complex formatting may not be fully preserved
Images and figures are not analyzed (use vision models)
Subject to LLM context window limits

Browser Extension

Examples

​ParallelDocQA Agent

​Basic Usage

​How It Works

​Performance Characteristics

Speed

Accuracy

Memory

Scalability

​Web UI Application

​Features

​Supported Document Formats

​Remote Files

​Configuration Options

​LLM Configuration

​Agent Configuration

​Advanced Usage Patterns

​Multi-Document Analysis

​Multi-Turn Conversations

​Specific Section Queries

​Virtual Memory QA

​Virtual Memory Features

Unlimited Size

Smart Caching

Context Management

Incremental Loading

​Best Practices

​Query Optimization

​Performance Optimization

​Common Use Cases

​Academic Research

​Legal Document Analysis

​Technical Documentation

​Troubleshooting

​Limitations

​Next Steps

Assistant Demos

Multi-Agent Chat

Build docs developers (and LLMs) love

ParallelDocQA Agent

Basic Usage

How It Works

Performance Characteristics

Web UI Application

Features

Supported Document Formats

Remote Files

Configuration Options

LLM Configuration

Agent Configuration

Advanced Usage Patterns

Multi-Document Analysis

Multi-Turn Conversations

Specific Section Queries

Virtual Memory QA

Virtual Memory Features

Best Practices

Query Optimization

Performance Optimization

Common Use Cases

Academic Research

Legal Document Analysis

Technical Documentation

Troubleshooting

Limitations

Next Steps