fetch

Overview

The fetch tool retrieves the full content of a documentation page using the document ID obtained from search results. It downloads the HTML content, extracts the main article content, and converts it to clean Markdown format for easy processing and analysis. This tool is designed to be used after the search tool to get detailed content from the most relevant documentation pages.

Parameters

string

required

Document ID from search results. The ID has the format index:relativePath where:

index is the base URL index (0 for single sites, 0+ for federated search)
relativePath is the relative path to the document

Examples:

0:topics/wsdl-converter.html
1:reference/api-guide.html

Always obtain the document ID from the search tool results. Do not construct IDs manually as the format may vary.

Response Format

The tool returns a JSON object containing the complete document information:

string

required

The document ID that was requested (same as the input parameter).

title

string

required

Document title extracted from the HTML page’s <title> element or heading.

text

string

required

The full document content converted to Markdown format. This includes:

All headings (converted to #, ##, ### style)
Paragraphs and text content
Lists (both ordered and unordered)
Code blocks and inline code
Links and references
Tables

The content is extracted from the <article> element to exclude navigation, headers, and footers.

url

string

required

Complete URL to the original documentation page. Useful for attribution and providing source links.

metadata

object

Optional metadata object (reserved for future use).

Success Response

{
  "id": "0:topics/wsdl-converter.html",
  "title": "WSDL Converter",
  "text": "# WSDL Converter\n\nThe WSDL Converter allows you to convert WSDL files to different formats...\n\n## Usage\n\nTo use the WSDL converter:\n\n1. Open the WSDL file\n2. Select **Tools > Convert**\n3. Choose the target format\n\n...",
  "url": "https://www.oxygenxml.com/doc/versions/27.1/ug-editor/topics/wsdl-converter.html"
}

Error Response

When an error occurs, the tool returns an error message:

"Fetch failed: Unknown base URL index: 5"

Common error scenarios:

Invalid document ID: The ID format is incorrect or the index is out of range
Document not found: The specified document doesn’t exist (HTTP 404)
Network errors: Connection timeout or failure to download the document
Parsing errors: Unable to extract or convert the document content

Usage Examples

import { Client } from '@modelcontextprotocol/sdk/client';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp';

const transport = new StreamableHTTPClientTransport(
  'http://localhost:3000/www.oxygenxml.com/doc/versions/27.1/ug-editor'
);
const client = new Client({ name: 'my-app', version: '1.0.0' });
await client.connect(transport);

// Fetch document by ID
const response = await client.callTool({ 
  name: 'fetch', 
  arguments: { id: '0:topics/wsdl-converter.html' } 
});

const doc = JSON.parse(response.content[0].text);
console.log(`Title: ${doc.title}`);
console.log(`URL: ${doc.url}`);
console.log(`Content: ${doc.text.substring(0, 200)}...`);

Real-World Example

Here’s a complete example from the test suite showing search and fetch workflow:

import { Client } from '@modelcontextprotocol/sdk/client';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp';

const WEBHELP_ENDPOINT = 'www.oxygenxml.com/doc/versions/27.1/ug-editor';

const transport = new StreamableHTTPClientTransport(
  `http://localhost:3000/${WEBHELP_ENDPOINT}`
);
const client = new Client({ name: 'e2e-test-client', version: '1.0.0' });
await client.connect(transport);

// Step 1: Search for documents
const searchResp = await client.callTool({ 
  name: 'search', 
  arguments: { query: 'wsdl' } 
});

const searchResults = JSON.parse(searchResp.content[0].text);
const first = searchResults[0];

console.log(`Found ${searchResults.length} results`);
console.log(`Top result: ${first.title} (${first.id})`);

// Step 2: Fetch the document content
const fetchResp = await client.callTool({ 
  name: 'fetch', 
  arguments: { id: first.id } 
});

const doc = JSON.parse(fetchResp.content[0].text);

// Step 3: Verify the content is relevant
if (doc.text.toLowerCase().includes('wsdl')) {
  console.log('✓ Document contains relevant information');
  console.log(`Title: ${doc.title}`);
  console.log(`URL: ${doc.url}`);
  console.log(`Content length: ${doc.text.length} characters`);
} else {
  console.warn('Document may not be relevant');
}

await client.close();

Implementation Details

The fetch tool is implemented in app/[...site]/route.ts:85-112:

Content Processing Pipeline

URL Resolution: The document ID is parsed to determine the base URL and relative path
HTML Download: The complete HTML page is downloaded from the documentation site
Article Extraction: The <article> element is extracted to get only the main content
Markdown Conversion: HTML is converted to clean Markdown using Turndown
Metadata Extraction: Title and URL are extracted for reference

The tool extracts only the <article> element to exclude navigation menus, headers, footers, and other UI elements. This ensures you get clean, focused content.

Markdown Conversion

The tool uses Turndown to convert HTML to Markdown with the following configuration:

Heading style: ATX (#, ##, ###)
Code block style: Fenced (```)
List marker: Dash (-)

This produces clean, readable Markdown that’s easy to process programmatically or display to users.

Document ID Format

Document IDs use the format index:relativePath:

Single site: All documents have index 0
- Example: 0:topics/wsdl-converter.html
Federated search: Each base URL has its own index (0, 1, 2, …)
- Example: 0:topics/editor.html (first site)
- Example: 1:topics/author.html (second site)

The index determines which base URL to use when constructing the full document URL.

Performance Considerations

Fetching documents requires downloading and processing HTML content. For better performance:

Only fetch documents you need to analyze in detail
Use search results (title, URL) for initial filtering
Consider caching fetched documents if you need to reference them multiple times
Fetch multiple documents in parallel when possible

Typical Response Times

Small documents (<50KB): 200-500ms
Medium documents (50-200KB): 500-1500ms
Large documents (>200KB): 1500-3000ms

Response times depend on network latency, document size, and HTML complexity.

Use Cases

Documentation Analysis

// Analyze documentation coverage of a topic
const searchResp = await client.callTool({ 
  name: 'search', 
  arguments: { query: 'API authentication' } 
});

const results = JSON.parse(searchResp.content[0].text);
const topDocs = results.slice(0, 3);

 for (const result of topDocs) {
  const fetchResp = await client.callTool({ 
    name: 'fetch', 
    arguments: { id: result.id } 
  });
  
  const doc = JSON.parse(fetchResp.content[0].text);
  
  // Check for specific information
  const hasCodeExamples = doc.text.includes('```');
  const mentionsOAuth = doc.text.toLowerCase().includes('oauth');
  
  console.log(`${doc.title}:`);
  console.log(`  - Has code examples: ${hasCodeExamples}`);
  console.log(`  - Mentions OAuth: ${mentionsOAuth}`);
}

Content Extraction

// Extract specific sections from documentation
function extractSection(markdown: string, heading: string): string {
  const regex = new RegExp(`## ${heading}[\\s\\S]*?(?=\\n## |$)`, 'i');
  const match = markdown.match(regex);
  return match ? match[0] : '';
}

const doc = JSON.parse(fetchResp.content[0].text);
const installSection = extractSection(doc.text, 'Installation');
const usageSection = extractSection(doc.text, 'Usage');

console.log('Installation instructions:', installSection);
console.log('Usage instructions:', usageSection);

Citation and References

// Generate citations for documentation references
function generateCitation(doc: any): string {
  return `[${doc.title}](${doc.url})`;
}

const doc = JSON.parse(fetchResp.content[0].text);
const citation = generateCitation(doc);

console.log(`For more information, see: ${citation}`);

search - Search documentation content to find relevant documents

MCP Tools

Architecture

Overview

Parameters

Response Format

Success Response

Error Response

Usage Examples

Real-World Example

Implementation Details

Content Processing Pipeline

Markdown Conversion

Document ID Format

Performance Considerations

Typical Response Times

Use Cases

Documentation Analysis

Content Extraction

Citation and References

Build docs developers (and LLMs) love

MCP Tools

Architecture

​Overview

​Parameters

​Response Format

​Success Response

​Error Response

​Usage Examples

​Real-World Example

​Implementation Details

​Content Processing Pipeline

​Markdown Conversion

​Document ID Format

​Performance Considerations

​Typical Response Times

​Use Cases

​Documentation Analysis

​Content Extraction

​Citation and References

​Related Tools

Build docs developers (and LLMs) love

Overview

Parameters

Response Format

Success Response

Error Response

Usage Examples

Real-World Example

Implementation Details

Content Processing Pipeline

Markdown Conversion

Document ID Format

Performance Considerations

Typical Response Times

Use Cases

Documentation Analysis

Content Extraction

Citation and References

Related Tools