Skip to main content

Overview

The fetch tool retrieves the full content of a documentation page using the document ID obtained from search results. It downloads the HTML content, extracts the main article content, and converts it to clean Markdown format for easy processing and analysis. This tool is designed to be used after the search tool to get detailed content from the most relevant documentation pages.

Parameters

id
string
required
Document ID from search results. The ID has the format index:relativePath where:
  • index is the base URL index (0 for single sites, 0+ for federated search)
  • relativePath is the relative path to the document
Examples:
  • 0:topics/wsdl-converter.html
  • 1:reference/api-guide.html
Always obtain the document ID from the search tool results. Do not construct IDs manually as the format may vary.

Response Format

The tool returns a JSON object containing the complete document information:
id
string
required
The document ID that was requested (same as the input parameter).
title
string
required
Document title extracted from the HTML page’s <title> element or heading.
text
string
required
The full document content converted to Markdown format. This includes:
  • All headings (converted to #, ##, ### style)
  • Paragraphs and text content
  • Lists (both ordered and unordered)
  • Code blocks and inline code
  • Links and references
  • Tables
The content is extracted from the <article> element to exclude navigation, headers, and footers.
url
string
required
Complete URL to the original documentation page. Useful for attribution and providing source links.
metadata
object
Optional metadata object (reserved for future use).

Success Response

{
  "id": "0:topics/wsdl-converter.html",
  "title": "WSDL Converter",
  "text": "# WSDL Converter\n\nThe WSDL Converter allows you to convert WSDL files to different formats...\n\n## Usage\n\nTo use the WSDL converter:\n\n1. Open the WSDL file\n2. Select **Tools > Convert**\n3. Choose the target format\n\n...",
  "url": "https://www.oxygenxml.com/doc/versions/27.1/ug-editor/topics/wsdl-converter.html"
}

Error Response

When an error occurs, the tool returns an error message:
"Fetch failed: Unknown base URL index: 5"
Common error scenarios:
  • Invalid document ID: The ID format is incorrect or the index is out of range
  • Document not found: The specified document doesn’t exist (HTTP 404)
  • Network errors: Connection timeout or failure to download the document
  • Parsing errors: Unable to extract or convert the document content

Usage Examples

import { Client } from '@modelcontextprotocol/sdk/client';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp';

const transport = new StreamableHTTPClientTransport(
  'http://localhost:3000/www.oxygenxml.com/doc/versions/27.1/ug-editor'
);
const client = new Client({ name: 'my-app', version: '1.0.0' });
await client.connect(transport);

// Fetch document by ID
const response = await client.callTool({ 
  name: 'fetch', 
  arguments: { id: '0:topics/wsdl-converter.html' } 
});

const doc = JSON.parse(response.content[0].text);
console.log(`Title: ${doc.title}`);
console.log(`URL: ${doc.url}`);
console.log(`Content: ${doc.text.substring(0, 200)}...`);

Real-World Example

Here’s a complete example from the test suite showing search and fetch workflow:
import { Client } from '@modelcontextprotocol/sdk/client';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp';

const WEBHELP_ENDPOINT = 'www.oxygenxml.com/doc/versions/27.1/ug-editor';

const transport = new StreamableHTTPClientTransport(
  `http://localhost:3000/${WEBHELP_ENDPOINT}`
);
const client = new Client({ name: 'e2e-test-client', version: '1.0.0' });
await client.connect(transport);

// Step 1: Search for documents
const searchResp = await client.callTool({ 
  name: 'search', 
  arguments: { query: 'wsdl' } 
});

const searchResults = JSON.parse(searchResp.content[0].text);
const first = searchResults[0];

console.log(`Found ${searchResults.length} results`);
console.log(`Top result: ${first.title} (${first.id})`);

// Step 2: Fetch the document content
const fetchResp = await client.callTool({ 
  name: 'fetch', 
  arguments: { id: first.id } 
});

const doc = JSON.parse(fetchResp.content[0].text);

// Step 3: Verify the content is relevant
if (doc.text.toLowerCase().includes('wsdl')) {
  console.log('✓ Document contains relevant information');
  console.log(`Title: ${doc.title}`);
  console.log(`URL: ${doc.url}`);
  console.log(`Content length: ${doc.text.length} characters`);
} else {
  console.warn('Document may not be relevant');
}

await client.close();

Implementation Details

The fetch tool is implemented in app/[...site]/route.ts:85-112:

Content Processing Pipeline

  1. URL Resolution: The document ID is parsed to determine the base URL and relative path
  2. HTML Download: The complete HTML page is downloaded from the documentation site
  3. Article Extraction: The <article> element is extracted to get only the main content
  4. Markdown Conversion: HTML is converted to clean Markdown using Turndown
  5. Metadata Extraction: Title and URL are extracted for reference
The tool extracts only the <article> element to exclude navigation menus, headers, footers, and other UI elements. This ensures you get clean, focused content.

Markdown Conversion

The tool uses Turndown to convert HTML to Markdown with the following configuration:
  • Heading style: ATX (#, ##, ###)
  • Code block style: Fenced (```)
  • List marker: Dash (-)
This produces clean, readable Markdown that’s easy to process programmatically or display to users.

Document ID Format

Document IDs use the format index:relativePath:
  • Single site: All documents have index 0
    • Example: 0:topics/wsdl-converter.html
  • Federated search: Each base URL has its own index (0, 1, 2, …)
    • Example: 0:topics/editor.html (first site)
    • Example: 1:topics/author.html (second site)
The index determines which base URL to use when constructing the full document URL.

Performance Considerations

Fetching documents requires downloading and processing HTML content. For better performance:
  • Only fetch documents you need to analyze in detail
  • Use search results (title, URL) for initial filtering
  • Consider caching fetched documents if you need to reference them multiple times
  • Fetch multiple documents in parallel when possible

Typical Response Times

  • Small documents (<50KB): 200-500ms
  • Medium documents (50-200KB): 500-1500ms
  • Large documents (>200KB): 1500-3000ms
Response times depend on network latency, document size, and HTML complexity.

Use Cases

Documentation Analysis

// Analyze documentation coverage of a topic
const searchResp = await client.callTool({ 
  name: 'search', 
  arguments: { query: 'API authentication' } 
});

const results = JSON.parse(searchResp.content[0].text);
const topDocs = results.slice(0, 3);

 for (const result of topDocs) {
  const fetchResp = await client.callTool({ 
    name: 'fetch', 
    arguments: { id: result.id } 
  });
  
  const doc = JSON.parse(fetchResp.content[0].text);
  
  // Check for specific information
  const hasCodeExamples = doc.text.includes('```');
  const mentionsOAuth = doc.text.toLowerCase().includes('oauth');
  
  console.log(`${doc.title}:`);
  console.log(`  - Has code examples: ${hasCodeExamples}`);
  console.log(`  - Mentions OAuth: ${mentionsOAuth}`);
}

Content Extraction

// Extract specific sections from documentation
function extractSection(markdown: string, heading: string): string {
  const regex = new RegExp(`## ${heading}[\\s\\S]*?(?=\\n## |$)`, 'i');
  const match = markdown.match(regex);
  return match ? match[0] : '';
}

const doc = JSON.parse(fetchResp.content[0].text);
const installSection = extractSection(doc.text, 'Installation');
const usageSection = extractSection(doc.text, 'Usage');

console.log('Installation instructions:', installSection);
console.log('Usage instructions:', usageSection);

Citation and References

// Generate citations for documentation references
function generateCitation(doc: any): string {
  return `[${doc.title}](${doc.url})`;
}

const doc = JSON.parse(fetchResp.content[0].text);
const citation = generateCitation(doc);

console.log(`For more information, see: ${citation}`);
  • search - Search documentation content to find relevant documents

Build docs developers (and LLMs) love