Configuration File
By default, configuration is stored in your system’s preferences directory:Example Config File
config.yaml
Using a Custom Config File
You can specify a custom config file with--config or DOCS_MCP_CONFIG:
Explicit config files are treated as read-only. The server will not modify them.
Configuration Priority
Configuration values are merged from multiple sources, with later sources taking precedence:- Defaults (lowest priority)
- Config File
- Environment Variables
- CLI Arguments (highest priority)
Environment Variables
Any configuration setting can be overridden via environment variables using the naming convention:- Convert
camelCasetoUPPER_SNAKE_CASE - Join nested paths with underscores
Common Environment Variables
Legacy Aliases
Some settings have legacy aliases for convenience:| Setting | Alias |
|---|---|
server.ports.default | PORT |
server.host | HOST |
CLI Arguments
Common settings have dedicated CLI flags:CLI Configuration Commands
Manage configuration directly from the command line:- View Config
- Get Value
- Set Value
Configuration Reference
App Settings
General application settings.Directory for storing databases and logs.
Enable anonymous usage telemetry.
Prevent modification of data (scraping/indexing).
Model to use for vector embeddings. Format:
provider:model_name or just model_name for OpenAI.Examples:openai:text-embedding-3-small(default)vertex:text-embedding-004(Google Cloud Vertex AI)gemini:gemini-embedding-exp-03-07(Google Generative AI)aws:amazon.titan-embed-text-v1microsoft:text-embedding-ada-002
Server Settings
Settings for the API and MCP servers.Server protocol:
stdio, http, or auto.Host interface to bind to.
MCP protocol heartbeat interval in milliseconds.
Default port for the main server.
Port for the background worker service.
Port for the specific MCP interface.
Port for the web dashboard.
Authentication Settings
Security settings for the HTTP server.Enable JWT authentication.
OIDC Issuer URL (e.g., Clerk, Auth0).
Expected JWT audience claim.
Scraper Settings
Settings controlling the web scraping behavior.Maximum number of pages to crawl per job.
Maximum link depth to traverse.
Number of concurrent page fetches.
Timeout for a single page load in milliseconds.
Timeout for the browser instance in milliseconds.
Number of retries for failed requests.
Initial delay for exponential backoff in milliseconds.
Maximum size in bytes for PDF/Office documents (10MB default).
Scraper settings are often overridden per-job via CLI arguments like
--max-pages.GitHub Authentication
Environment variables for authenticating with GitHub when scraping private repositories.GitHub personal access token or fine-grained token. Used for private repo access and higher rate limits.
Alternative to
GITHUB_TOKEN. Used if GITHUB_TOKEN is not set.- Explicit
Authorizationheader passed in scraper options GITHUB_TOKENenvironment variableGH_TOKENenvironment variable- Local
ghCLI authentication (viagh auth token)
If no authentication is available, public repositories are still accessible but with lower rate limits (60 requests/hour vs 5,000 authenticated).
Splitter Settings
Settings for chunking text for vector search.Minimum characters per chunk body. Chunks below this threshold are merged with adjacent chunks by the greedy optimizer.
Soft target for chunk body size in characters. The greedy optimizer splits when combining two chunks would exceed this value.
Hard upper limit for chunk body size in characters. No chunk body will exceed this value.
Embedding Settings
Settings for vector embedding generation.Number of chunks to embed in one request.
Dimension of the vector space (must match your embedding model).
Database Settings
Internal database settings.Retries for database migrations on startup.
Assembly Settings
Settings for reassembling search results.Maximum sort_order difference to merge chunks.
Maximum depth for parent context traversal.
Maximum number of child chunks to include.
Number of preceding sibling chunks to include.
Number of subsequent sibling chunks to include.
