fetch_url Tool

The fetch_url tool fetches a single web page and converts its content to Markdown without indexing or crawling. Useful for reading documentation pages on-demand.

Tool Definition

MCP Tool Name: fetch_url Source: src/tools/FetchUrlTool.ts Description: Fetch a single URL and convert its content to Markdown. Unlike scrape_docs, this tool only processes one page without crawling or storing the content.

Parameters

url

string

required

The URL to fetch and convert to markdown. Must be a valid HTTP/HTTPS URL or file:// URL.

followRedirects

boolean

default:"true"

Whether to follow HTTP redirects (3xx responses). When false, throws an error on redirect.

scrapeMode

'fetch' | 'playwright' | 'auto'

default:"auto"

HTML processing strategy:

fetch: Simple DOM parser (faster, less JS support)
playwright: Headless browser (slower, full JS support)
auto: Automatically select the best strategy (currently defaults to playwright)

headers

Record<string, string>

Custom HTTP headers to send with the request (e.g., for authentication).Example:

{
  "Authorization": "Bearer token123",
  "User-Agent": "MyBot/1.0"
}

Response

Returns the processed Markdown content as a string.

TypeScript Types

interface FetchUrlToolOptions {
  url: string;
  followRedirects?: boolean;
  scrapeMode?: ScrapeMode;
  headers?: Record<string, string>;
}

enum ScrapeMode {
  Fetch = "fetch",
  Playwright = "playwright",
  Auto = "auto"
}

See src/tools/FetchUrlTool.ts:11-38 for complete type definitions.

Example Requests

Basic Fetch

{
  "name": "fetch_url",
  "arguments": {
    "url": "https://react.dev/reference/react/useState"
  }
}

Fetch with Custom Headers

{
  "name": "fetch_url",
  "arguments": {
    "url": "https://docs.private-company.com/api",
    "headers": {
      "Authorization": "Bearer token123"
    }
  }
}

Fetch without Following Redirects

{
  "name": "fetch_url",
  "arguments": {
    "url": "https://example.com/old-page",
    "followRedirects": false
  }
}

Fetch with Playwright

{
  "name": "fetch_url",
  "arguments": {
    "url": "https://spa-app.com/docs",
    "scrapeMode": "playwright"
  }
}

Example Response

# useState

`useState` is a React Hook that lets you add a state variable to your component.

```js
const [state, setState] = useState(initialState);

Reference

`useState(initialState)`

Call useState at the top level of your component to declare a state variable.

import { useState } from 'react';

function MyComponent() {
  const [age, setAge] = useState(28);
  // ...
}

Parameters:

initialState: The value you want the state to be initially. It can be a value of any type, but there is a special behavior for functions…

## MCP Output

When called through MCP (see `src/mcp/mcpServer.ts:458`), the response is the raw Markdown content.

## Error Cases

### Invalid URL

Throws `ValidationError`:

```json
{
  "error": "Invalid URL: not-a-url. Must be an HTTP/HTTPS URL or a file:// URL."
}

Fetch Failed

Throws ToolError:

{
  "error": "Unable to fetch or process the URL \"https://example.com/404\". Please verify the URL is correct and accessible."
}

Empty Content

Throws ToolError:

{
  "error": "Processing resulted in empty content for https://example.com/empty"
}

Content Processing Pipeline

The tool uses a pipeline architecture to process different content types:

HTML: Converted to Markdown using HTML pipeline
Markdown: Normalized and cleaned
Plain Text: Returned as-is
Other: Returned as raw text with charset detection

See src/tools/FetchUrlTool.ts:96-118 for pipeline selection logic.

Supported Protocols

HTTP/HTTPS

Fetches web pages using the AutoDetectFetcher:

Uses node-fetch by default
Automatically retries on transient errors (up to 3 times)
Respects followRedirects setting

file://

Reads local files:

{
  "url": "file:///home/user/docs/readme.md"
}

Supports absolute paths only
No redirect handling
Infers MIME type from file extension

Usage from MCP Clients

Claude Desktop

Fetch the content from https://react.dev/reference/react/useState and show it to me.

Direct MCP Call

import { Client } from '@modelcontextprotocol/sdk/client/index.js';

const result = await client.callTool({
  name: 'fetch_url',
  arguments: {
    url: 'https://react.dev/reference/react/useState',
    followRedirects: true
  }
});

console.log(result);

Implementation Details

The tool validates the URL and delegates to the AutoDetectFetcher:

async execute(options: FetchUrlToolOptions): Promise<string> {
  const { url, scrapeMode = ScrapeMode.Auto, headers } = options;
  
  // Validate URL
  if (!this.fetcher.canFetch(url)) {
    throw new ValidationError(
      `Invalid URL: ${url}. Must be an HTTP/HTTPS URL or a file:// URL.`,
      this.constructor.name,
    );
  }
  
  try {
    logger.info(`📡 Fetching ${url}...`);
    
    // Fetch content
    const rawContent: RawContent = await this.fetcher.fetch(url, {
      followRedirects: options.followRedirects ?? true,
      maxRetries: 3,
      headers,
    });
    
    logger.info("🔄 Processing content...");
    
    // Process content through pipelines
    let processed: Awaited<PipelineResult> | undefined;
    for (const pipeline of this.pipelines) {
      if (pipeline.canProcess(rawContent.mimeType, rawContent.content)) {
        processed = await pipeline.process(rawContent, {...}, this.fetcher);
        break;
      }
    }
    
    if (!processed) {
      // Fallback: return raw content with charset detection
      const resolvedCharset = resolveCharset(
        rawContent.charset,
        rawContent.content,
        rawContent.mimeType,
      );
      return convertToString(rawContent.content, resolvedCharset);
    }
    
    if (!processed.textContent?.trim()) {
      throw new ToolError(
        `Processing resulted in empty content for ${url}`,
        this.constructor.name,
      );
    }
    
    logger.info(`✅ Successfully processed ${url}`);
    return processed.textContent;
  } catch (error) {
    // Error handling
    if (error instanceof ToolError) {
      throw error;
    }
    throw new ToolError(
      `Unable to fetch or process the URL "${url}". Please verify the URL is correct and accessible.`,
      this.constructor.name,
    );
  } finally {
    // Cleanup pipelines and fetcher
    await Promise.allSettled([
      ...this.pipelines.map((pipeline) => pipeline.close()),
      this.fetcher.close(),
    ]);
  }
}

See src/tools/FetchUrlTool.ts:71 for the complete implementation.

Resource Cleanup

The tool automatically cleans up resources after execution:

Closes all processing pipelines
Closes the fetcher (including browser instances if Playwright was used)

See src/tools/FetchUrlTool.ts:158-162 for cleanup logic.

scrape_docs - Index entire documentation sites
search_docs - Search indexed documentation

Use Cases

Quick Documentation Lookup

Read a specific page without indexing:

const content = await fetchUrl({
  url: "https://react.dev/reference/react/useState"
});

Testing URLs Before Scraping

Verify a URL is accessible before starting a full scrape:

try {
  await fetchUrl({ url: "https://docs.example.com" });
  // URL is accessible, start scraping
  await scrapeDocs({ url: "https://docs.example.com", ... });
} catch (error) {
  console.error("URL not accessible:", error);
}

Reading Private Documentation

Access authenticated documentation:

const content = await fetchUrl({
  url: "https://internal.company.com/docs",
  headers: {
    "Authorization": "Bearer token123"
  }
});

Converting Local Files

Convert local HTML files to Markdown:

const content = await fetchUrl({
  url: "file:///home/user/docs/guide.html"
});

Performance Considerations

Scrape Mode Selection

fetch (fastest): Use for static HTML pages
- ~100-500ms per page
- No JavaScript execution
playwright (slower): Use for JavaScript-heavy sites
- ~1-3s per page
- Full browser rendering
auto (smart): Lets the system decide
- Currently defaults to Playwright for maximum compatibility

Custom Headers

Adding custom headers adds minimal overhead (~10ms).

Redirect Handling

Following redirects adds one round-trip per redirect (~50-200ms).

Available Tools

Tool Definition

Parameters

Response

TypeScript Types

Example Requests

Basic Fetch

Fetch with Custom Headers

Fetch without Following Redirects

Fetch with Playwright

Example Response

Reference

`useState(initialState)`

Fetch Failed

Empty Content

Content Processing Pipeline

Supported Protocols

HTTP/HTTPS

file://

Usage from MCP Clients

Claude Desktop

Direct MCP Call

Implementation Details

Resource Cleanup

Use Cases

Quick Documentation Lookup

Testing URLs Before Scraping

Reading Private Documentation

Converting Local Files

Performance Considerations

Scrape Mode Selection

Custom Headers

Redirect Handling

Build docs developers (and LLMs) love

Available Tools

​Tool Definition

​Parameters

​Response

​TypeScript Types

​Example Requests

​Basic Fetch

​Fetch with Custom Headers

​Fetch without Following Redirects

​Fetch with Playwright

​Example Response

​Reference

​useState(initialState)

​Fetch Failed

​Empty Content

​Content Processing Pipeline

​Supported Protocols

​HTTP/HTTPS

​file://

​Usage from MCP Clients

​Claude Desktop

​Direct MCP Call

​Implementation Details

​Resource Cleanup

​Related Tools

​Use Cases

​Quick Documentation Lookup

​Testing URLs Before Scraping

​Reading Private Documentation

​Converting Local Files

​Performance Considerations

​Scrape Mode Selection

​Custom Headers

​Redirect Handling

Build docs developers (and LLMs) love

Tool Definition

Parameters

Response

TypeScript Types

Example Requests

Basic Fetch

Fetch with Custom Headers

Fetch without Following Redirects

Fetch with Playwright

Example Response

Reference

`useState(initialState)`

Fetch Failed

Empty Content

Content Processing Pipeline

Supported Protocols

HTTP/HTTPS

file://

Usage from MCP Clients

Claude Desktop

Direct MCP Call

Implementation Details

Resource Cleanup

Related Tools

Use Cases

Quick Documentation Lookup

Testing URLs Before Scraping

Reading Private Documentation

Converting Local Files

Performance Considerations

Scrape Mode Selection

Custom Headers

Redirect Handling