scrape_docs
Index documentation from a URL for a library version. MCP Tool Name:scrape_docs
Source: src/tools/ScrapeTool.ts
Parameters
Library name to index
Documentation root URL to scrape (must be valid HTTP/HTTPS)
Library version (e.g., “5.4.2”, “18.0.0”). Omit or pass null for unversioned docs.
Scraping configuration options
Maximum number of pages to scrape
Maximum navigation depth from the root URL
Crawling boundary:
subpages: Only crawl URLs within the same path as the starting URLhostname: Crawl any URL on the same hostnamedomain: Crawl any URL on the same top-level domain (including subdomains)
Whether to follow HTTP redirects (3xx responses). When false, throws RedirectError.
Maximum number of concurrent requests
Continue scraping if individual pages fail
HTML processing strategy:
fetch: Simple DOM parser (faster, less JS support)playwright: Headless browser (slower, full JS support)auto: Automatically select best strategy
Regex patterns for including URLs. Regex patterns must be wrapped in slashes:
/pattern/Regex patterns for excluding URLs. Exclude takes precedence over include. Regex patterns must be wrapped in slashes:
/pattern/Custom HTTP headers to send with each request (e.g., for authentication)
If true, clears existing documents for the library version before scraping. If false, appends to existing documents.
Whether to wait for scraping to complete. When false (default for MCP), returns jobId immediately.
Response Structure
UUID of the scraping job (returned when waitForCompletion is false)
Number of pages successfully scraped (returned when waitForCompletion is true)
TypeScript Types
src/tools/ScrapeTool.ts:8-73 for complete type definitions.
Example Request
Example Response
MCP Output
When called through MCP (seesrc/mcp/mcpServer.ts:42):
Error Cases
Invalid version format:Version Format Support
The tool accepts and normalizes various version formats:- Full semver:
"18.2.0"→"18.2.0" - Partial version:
"18.2"→"18.2.0"(coerced) - Major only:
"18"→"18.0.0"(coerced) - Unversioned:
nullor""→""(empty string) - Prerelease:
"18.3.0-rc.1"→"18.3.0-rc.1"
src/tools/ScrapeTool.ts:98-124 for version normalization logic.
Scope Examples
Given starting URLhttps://react.dev/reference/react/hooks:
subpages (default):
- ✅
https://react.dev/reference/react/hooks/useState - ✅
https://react.dev/reference/react/hooks/useEffect - ❌
https://react.dev/learn(different path) - ❌
https://legacy.reactjs.org/(different hostname)
- ✅
https://react.dev/reference/react/hooks - ✅
https://react.dev/learn - ❌
https://legacy.reactjs.org/(different hostname)
- ✅
https://react.dev/reference/react/hooks - ✅
https://legacy.react.dev/(subdomain) - ❌
https://reactjs.org/(different domain)
refresh_version
Re-scrape a previously indexed library version, updating only changed pages. MCP Tool Name:refresh_version
Source: src/tools/RefreshTool.ts (uses same underlying ScrapeTool)
Parameters
Library name to refresh
Version to refresh (defaults to latest if omitted)
Response Structure
UUID of the refresh job
Example Request
MCP Output
Difference from scrape_docs
- scrape_docs: Indexes from scratch (clears existing data by default)
- refresh_version: Re-scrapes existing version, only updates changed pages
remove_docs
Remove indexed documentation for a library version. MCP Tool Name:remove_docs
Source: src/tools/RemoveTool.ts
Parameters
Library name
Version to remove (defaults to latest if omitted)
Response Structure
Confirmation message
Example Request
MCP Output
Error Cases
Library not found:Job Management
Since scraping operations are asynchronous, use the job management tools to monitor progress:- list_jobs - See all scraping jobs
- get_job_info - Check job status and progress
- cancel_job - Stop a running job
Example Workflow
Related Tools
- list_jobs - Monitor scraping jobs
- get_job_info - Check job progress
- cancel_job - Cancel a running job
- search_docs - Search indexed documentation
