Skip to main content
Grounded Docs supports indexing documentation from multiple source types. Each source has its own strategy for fetching and processing content.

Supported Source Types

Websites

HTTP/HTTPS documentation sites

GitHub

Repositories, wikis, and specific branches

npm

npm package READMEs and docs

PyPI

Python package documentation

Local Files

Filesystem directories and files

ZIP Archives

Compressed documentation bundles

Web Documentation

Scrape documentation from HTTP/HTTPS URLs.

Basic Usage

npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react

Advanced Options

maxPages
number
default:"1000"
Maximum number of pages to scrape
maxDepth
number
default:"10"
Maximum link depth to follow
includePatterns
string[]
Only scrape URLs matching these glob patternsExample: ["**/api/**", "**/reference/**"]
excludePatterns
string[]
Skip URLs matching these patternsExample: ["**/blog/**", "**/changelog/**"]

Examples

npx @arabold/docs-mcp-server@latest scrape react \
  --version 18.3.1 \
  --max-pages 500 \
  --include '**/reference/**' \
  https://react.dev/reference/react
Use --include patterns to focus on specific documentation sections and speed up indexing.

GitHub Repositories

Index documentation directly from GitHub repositories.

URL Formats

https://github.com/facebook/react

GitHub Authentication

For private repositories or to avoid rate limits, configure GitHub authentication:
GITHUB_TOKEN="ghp_your_personal_access_token" \
npx @arabold/docs-mcp-server@latest scrape my-lib https://github.com/owner/repo
See Authentication for complete GitHub auth setup.

Supported File Types

GitHub strategy processes:
  • Markdown files (.md, .mdx)
  • Documentation files (.txt, .rst)
  • Source code (.js, .ts, .py, .java, etc.)
  • Configuration files (.json, .yaml, .toml)

Examples

npx @arabold/docs-mcp-server@latest scrape typescript \
  --version 5.3 \
  https://github.com/microsoft/TypeScript

npm Packages

Index npm package documentation and READMEs.

URL Format

npm:<package-name>

Examples

npx @arabold/docs-mcp-server@latest scrape lodash npm:lodash --version 4.17.21

What Gets Indexed

  • README.md and documentation files
  • Package.json metadata
  • TypeScript type definitions
  • Extracted from published npm tarball
npm packages are downloaded as tarballs and extracted. Documentation files are automatically detected and processed.

PyPI Packages

Index Python package documentation from PyPI.

URL Format

pypi:<package-name>

Examples

npx @arabold/docs-mcp-server@latest scrape requests pypi:requests --version 2.31.0

What Gets Indexed

  • README and documentation files
  • Package metadata
  • Python source code docstrings
  • Extracted from PyPI distributions

Local Files

Index documentation from your local filesystem.

URL Format

file:///<absolute-path>

Platform Examples

Single File
file:///Users/me/docs/README.md
Directory
file:///home/user/project/docs

Supported File Types

Documentation

.md, .mdx, .txt, .rst, .asciidoc

Source Code

.js, .ts, .py, .java, .go, .rs, etc.

Office Docs

.docx, .xlsx, .pptx

Others

.pdf, .html, .json, .yaml

Docker Volumes

When using Docker, mount local directories:
1

Mount the volume

docker run --rm \
  -v /path/to/docs:/docs:ro \
  -v docs-mcp-data:/data \
  -p 6280:6280 \
  ghcr.io/arabold/docs-mcp-server:latest
2

Use container path

# Inside container or via web UI
npx @arabold/docs-mcp-server@latest scrape my-docs file:///docs
Local file paths must be absolute. Relative paths are not supported.

Examples

npx @arabold/docs-mcp-server@latest scrape my-project \
  --version 1.0.0 \
  file:///Users/me/projects/my-app/docs

ZIP Archives

Index documentation from ZIP files.

URL Format

file:///<absolute-path-to-zip>

Example

npx @arabold/docs-mcp-server@latest scrape archived-docs \
  --version 2.0 \
  file:///Users/me/downloads/docs-archive.zip

Processing

1

Extract

ZIP file is extracted to a temporary directory
2

Process

All supported file types are indexed
3

Clean Up

Temporary files are deleted after processing
ZIP archives are processed the same way as local directories. All supported file types are extracted and indexed.

Best Practices

Use semantic versioning patterns:
  • Exact: 1.2.3, 18.3.1
  • X-Range: 18.x, 3.x (matches latest minor/patch)
  • Latest: latest (for most recent version)
Consistent version naming helps with search and version resolution.
Use include/exclude patterns to:
  • Focus on relevant sections (**/api/**, **/reference/**)
  • Skip non-documentation (**/blog/**, **/changelog/**)
  • Reduce indexing time and storage
  • Improve search relevance
Use the organization field to group related libraries:
# React ecosystem
--organization facebook

# Internal company docs
--organization acme-corp

# Tools and frameworks
--organization dev-tools
  • Use refresh for updates to existing documentation (faster, uses ETags)
  • Use scrape for new libraries or complete re-indexing
See refresh command for details.

Troubleshooting

Problem: HTTP 429 errors or slow scrapingSolutions:
  • Reduce --max-concurrency (default: 5)
  • Use --delay to add delays between requests
  • For GitHub: Add GITHUB_TOKEN for higher rate limits
Problem: 401/403 errors for private contentSolutions:
  • Add appropriate credentials (GITHUB_TOKEN, etc.)
  • Verify token has required permissions
  • Check token expiration
Problem: Local file paths failSolutions:
  • Use absolute paths, not relative
  • Check file/directory exists and is readable
  • For Docker: Verify volume mount is correct
  • Check path uses forward slashes
Problem: Scraping takes too long or times outSolutions:
  • Use --include patterns to limit scope
  • Increase --max-pages if needed
  • For GitHub: Target specific directories instead of full repo
  • Consider splitting into multiple libraries

Next Steps

Search Documentation

Learn how to search your indexed documentation

CLI Reference

Complete scrape command reference

Configuration

Configure scraper options and defaults

MCP Tools

Use scraping from your AI assistant

Build docs developers (and LLMs) love