Skip to main content
The Docs MCP Server uses a unified configuration system that aggregates settings from multiple sources, validating them against a strict schema.

Configuration File

By default, configuration is stored in your system’s preferences directory:
~/Library/Preferences/docs-mcp-server/config.yaml

Example Config File

config.yaml
app:
  storePath: ~/.docs-mcp-server
  telemetryEnabled: true
  embeddingModel: text-embedding-3-small

scraper:
  maxPages: 1000
  maxDepth: 3
  document:
    maxSize: 10485760  # 10MB

splitter:
  preferredChunkSize: 1500
  maxChunkSize: 5000
The server automatically updates this file on startup with new defaults.

Using a Custom Config File

You can specify a custom config file with --config or DOCS_MCP_CONFIG:
docs-mcp-server --config /path/to/config.yaml
Explicit config files are treated as read-only. The server will not modify them.

Configuration Priority

Configuration values are merged from multiple sources, with later sources taking precedence:
  1. Defaults (lowest priority)
  2. Config File
  3. Environment Variables
  4. CLI Arguments (highest priority)

Environment Variables

Any configuration setting can be overridden via environment variables using the naming convention:
DOCS_MCP_<SECTION>_<SETTING>
Rules:
  • Convert camelCase to UPPER_SNAKE_CASE
  • Join nested paths with underscores

Common Environment Variables

# Override scraper settings
export DOCS_MCP_SCRAPER_MAX_PAGES=2000
export DOCS_MCP_SCRAPER_DOCUMENT_MAX_SIZE=52428800

Legacy Aliases

Some settings have legacy aliases for convenience:
SettingAlias
server.ports.defaultPORT
server.hostHOST

CLI Arguments

Common settings have dedicated CLI flags:
docs-mcp-server --port 8080 --host 0.0.0.0
docs-mcp-server --store-path /data/docs --read-only

CLI Configuration Commands

Manage configuration directly from the command line:
# View current configuration (JSON format)
docs-mcp-server config

# View current configuration (YAML format)
docs-mcp-server config --yaml

Configuration Reference

App Settings

General application settings.
app.storePath
string
default:"~/.docs-mcp-server"
Directory for storing databases and logs.
app.telemetryEnabled
boolean
default:"true"
Enable anonymous usage telemetry.
app.readOnly
boolean
default:"false"
Prevent modification of data (scraping/indexing).
app.embeddingModel
string
default:"text-embedding-3-small"
Model to use for vector embeddings. Format: provider:model_name or just model_name for OpenAI.Examples:
  • openai:text-embedding-3-small (default)
  • vertex:text-embedding-004 (Google Cloud Vertex AI)
  • gemini:gemini-embedding-exp-03-07 (Google Generative AI)
  • aws:amazon.titan-embed-text-v1
  • microsoft:text-embedding-ada-002

Server Settings

Settings for the API and MCP servers.
server.protocol
string
default:"auto"
Server protocol: stdio, http, or auto.
server.host
string
default:"127.0.0.1"
Host interface to bind to.
server.heartbeatMs
number
default:"30000"
MCP protocol heartbeat interval in milliseconds.
server.ports.default
number
default:"6280"
Default port for the main server.
server.ports.worker
number
default:"8080"
Port for the background worker service.
server.ports.mcp
number
default:"6280"
Port for the specific MCP interface.
server.ports.web
number
default:"6281"
Port for the web dashboard.

Authentication Settings

Security settings for the HTTP server.
auth.enabled
boolean
default:"false"
Enable JWT authentication.
auth.issuerUrl
string
OIDC Issuer URL (e.g., Clerk, Auth0).
auth.audience
string
Expected JWT audience claim.

Scraper Settings

Settings controlling the web scraping behavior.
scraper.maxPages
number
default:"1000"
Maximum number of pages to crawl per job.
scraper.maxDepth
number
default:"3"
Maximum link depth to traverse.
scraper.maxConcurrency
number
default:"3"
Number of concurrent page fetches.
scraper.pageTimeoutMs
number
default:"5000"
Timeout for a single page load in milliseconds.
scraper.browserTimeoutMs
number
default:"30000"
Timeout for the browser instance in milliseconds.
scraper.fetcher.maxRetries
number
default:"6"
Number of retries for failed requests.
scraper.fetcher.baseDelayMs
number
default:"1000"
Initial delay for exponential backoff in milliseconds.
scraper.document.maxSize
number
default:"10485760"
Maximum size in bytes for PDF/Office documents (10MB default).
Scraper settings are often overridden per-job via CLI arguments like --max-pages.

GitHub Authentication

Environment variables for authenticating with GitHub when scraping private repositories.
GITHUB_TOKEN
string
GitHub personal access token or fine-grained token. Used for private repo access and higher rate limits.
GH_TOKEN
string
Alternative to GITHUB_TOKEN. Used if GITHUB_TOKEN is not set.
Authentication Resolution Order:
  1. Explicit Authorization header passed in scraper options
  2. GITHUB_TOKEN environment variable
  3. GH_TOKEN environment variable
  4. Local gh CLI authentication (via gh auth token)
If no authentication is available, public repositories are still accessible but with lower rate limits (60 requests/hour vs 5,000 authenticated).

Splitter Settings

Settings for chunking text for vector search.
splitter.minChunkSize
number
default:"500"
Minimum characters per chunk body. Chunks below this threshold are merged with adjacent chunks by the greedy optimizer.
splitter.preferredChunkSize
number
default:"1500"
Soft target for chunk body size in characters. The greedy optimizer splits when combining two chunks would exceed this value.
splitter.maxChunkSize
number
default:"5000"
Hard upper limit for chunk body size in characters. No chunk body will exceed this value.
These size limits apply to the text body of each chunk. Before embedding, a small metadata header (page title, URL, section path) is prepended to each chunk. If your embedding model has a small context window, consider lowering maxChunkSize.

Embedding Settings

Settings for vector embedding generation.
embeddings.batchSize
number
default:"100"
Number of chunks to embed in one request.
embeddings.vectorDimension
number
default:"1536"
Dimension of the vector space (must match your embedding model).

Database Settings

Internal database settings.
db.migrationMaxRetries
number
default:"5"
Retries for database migrations on startup.

Assembly Settings

Settings for reassembling search results.
assembly.maxChunkDistance
number
default:"3"
Maximum sort_order difference to merge chunks.
assembly.maxParentChainDepth
number
default:"10"
Maximum depth for parent context traversal.
assembly.childLimit
number
default:"3"
Maximum number of child chunks to include.
assembly.precedingSiblingsLimit
number
default:"1"
Number of preceding sibling chunks to include.
assembly.subsequentSiblingsLimit
number
default:"2"
Number of subsequent sibling chunks to include.

Provider-Specific Configuration

For detailed configuration of embedding providers (OpenAI, Ollama, Gemini, Azure, AWS), see the Embedding Models guide.

Build docs developers (and LLMs) love