Fetch Tool
Thefetch tool retrieves complete document content from WebHelp documentation sites and converts it from HTML to clean Markdown. This enables AI assistants to analyze, cite, and answer questions with full context.
How It Works
When you invoke the fetch tool with a document ID:- Resolves the document URL from the ID (format:
index:path) - Downloads the HTML content from the WebHelp site
- Extracts the main
<article>element to remove navigation and UI - Converts HTML to Markdown using Turndown
- Returns the structured content with title, text, and metadata
Implementation
Here’s the complete fetch implementation from the source code:Document ID Resolution
Document IDs use a composite format to support federated search:index:path
- index — Zero-based index into the
baseUrlsarray - path — Relative path to the document
0:topics/introduction.html— First site, introduction topic1:reference/api.html— Second site, API reference0:topics/config/advanced.html— Path with multiple segments
The colon separator means paths themselves can contain colons. The implementation splits on the first colon only.
Article Extraction
The fetch tool extracts only the main content to avoid sending navigation, headers, and footers to the AI:HTML to Markdown Conversion
The server uses Turndown to convert HTML to Markdown:- ATX-style headings (
#,##,###) - Fenced code blocks with language detection
- Preserved code formatting and syntax
- Tables converted to Markdown tables
- Links and images preserved
- Lists properly formatted
MCP Tool Definition
Here’s how the fetch tool is exposed via the Model Context Protocol:Parameters
Document ID from search results. Format:
index:path where index is the site index and path is the relative document path.Return Value
The fetch tool returns a JSON object with the document content:Result Fields
The document ID that was requested
Document title extracted from the HTML
<title> tag or page metadataComplete document content converted to Markdown format
Full URL to the original HTML document
Additional metadata if available (currently unused)
Usage Examples
Claude Desktop Workflow
Fetching DITA-OT Documentation
0:topics/output-formats.html
Result: Complete guide to DITA-OT output formats in Markdown
Fetching Oxygen XML Documentation
0:topics/streamline-with-content-completion.html
Result: Full content completion documentation with examples
Error Handling
Invalid Document ID
Document Not Found
- Document was moved or deleted
- Search index is outdated
- Incorrect document path
Conversion Failures
If Turndown encounters problematic HTML, it may produce malformed Markdown. The server doesn’t validate output quality.Performance Considerations
Download Time
Each fetch downloads the complete HTML page from the WebHelp site:- Typical page size: 10-100 KB
- Download time: 100-500ms depending on network and server
- No caching: each fetch downloads fresh content
Conversion Time
HTML to Markdown conversion is fast but depends on document size:- Small documents (< 50 KB): < 50ms
- Medium documents (50-200 KB): 50-200ms
- Large documents (> 200 KB): 200ms-1s
The MCP server processes requests synchronously. Large documents may cause timeouts in some AI tools.
Best Practices
Search First
Always search before fetching to find the right document IDs
Use Exact IDs
Never construct IDs manually — always use search results
Fetch Selectively
Only fetch documents you need — searches are much faster
Check URLs
Include the original URL in citations for user reference
Markdown Quality
The conversion quality depends on the WebHelp HTML structure: Well-Converted Elements:- Headings and paragraphs
- Code blocks with syntax highlighting
- Tables and lists
- Links and images
- Bold and italic text
- Custom WebHelp widgets
- JavaScript-rendered content
- Complex CSS layouts
- Embedded multimedia
Next Steps
Search Tool
Learn how to find documents to fetch
Federated Search
Fetch from multiple sites
Integration Guide
Set up with Claude Desktop
Deploy Your Own
Host a private instance