Unified Server Mode
Single process containing all services on one port (default: 6280). This mode combines:- MCP server accessible via
/mcpand/sseendpoints - Web interface for job management
- Embedded worker for document processing
- API (tRPC over HTTP) for programmatic access
Use Cases
Development
Fast iteration with hot reload
Single Container
Simple production deployments
Local Indexing
Personal documentation management
Prototyping
Quick setup and testing
Service Configuration
Services can be selectively enabled viaAppServerConfig:
src/app/AppServerConfig.ts
Starting Unified Server
Distributed Mode
Separate coordinator and worker processes for scaling. The coordinator handles interfaces while workers process jobs independently.Architecture
Communication: Coordinators use tRPC over HTTP for commands and WebSocket for real-time events from workers.
Components
Coordinator:- Runs MCP server, web interface, and API
- Delegates processing to external workers
- No embedded worker (uses
PipelineClient) - Lightweight, stateless interface layer
- Execute document processing jobs
- Run
PipelineManagerwith embedded workers - Expose tRPC API for job management
- Independent job recovery and state management
Use Cases
High Volume
Process large documentation sets
Container Orchestration
Kubernetes, Docker Swarm deployments
Horizontal Scaling
Add workers based on load
Resource Isolation
Separate processing from interfaces
Starting Distributed Mode
Coordinator:Protocol Auto-Detection
The system automatically selects the communication protocol based on execution environment, enabling seamless integration with different tools.Detection Logic
src/index.ts
Stdio Mode
Automatically selected when stdin/stdout are not TTY (terminal). Used by VS Code, Claude Desktop, and other AI tools.
- Direct MCP communication via stdin/stdout
- No HTTP server required
- Minimal resource usage
- Binary protocol for efficiency
HTTP Mode
Automatically selected when running in an interactive terminal. Provides full web interface and API access.
- Server-Sent Events transport for MCP
- Full web interface at root URL
- API accessible at
/api - MCP endpoints at
/mcpand/sse
http://localhost:6280/- Web UIhttp://localhost:6280/mcp- MCP over Streamable HTTPhttp://localhost:6280/sse- MCP over Server-Sent Eventshttp://localhost:6280/api- tRPC API
Manual Override
Protocol can be explicitly set via--protocol flag, bypassing auto-detection:
Configuration
Deployment settings are resolved through a layered configuration system: Priority Order (highest to lowest):- CLI arguments (
--protocol,--port,--server-url) - Environment variables (
DOCS_MCP_PROTOCOL,DOCS_MCP_PORT) - Config file (
docs-mcp.config.yamlorDOCS_MCP_CONFIG) - Built-in defaults
Key Configuration Options
| Option | Environment Variable | CLI Flag | Default | Description |
|---|---|---|---|---|
| Protocol | DOCS_MCP_PROTOCOL | --protocol | auto | Transport protocol (stdio/http) |
| Port | DOCS_MCP_PORT | --port | 6280 | HTTP server port |
| Server URL | DOCS_MCP_SERVER_URL | --server-url | - | External worker URL |
| Concurrency | DOCS_MCP_CONCURRENCY | - | 3 | Worker concurrency limit |
src/utils/config.ts
Job Recovery
Job recovery behavior differs based on deployment mode to prevent conflicts and ensure data consistency.Unified Server Mode
Embedded worker recovers pending jobs from database on startup, ensuring no work is lost during restarts.
- Load
QUEUEDandRUNNINGjobs from database - Reset
RUNNINGjobs toQUEUEDstate - Resume processing with original configuration
- Maintain progress history
recoverJobs: true in PipelineFactory
Distributed Mode
Workers handle their own job recovery. Coordinators do not recover jobs to avoid conflicts with worker state.
- Each worker maintains independent job state
- Workers recover jobs on startup
- Coordinator remains stateless
- No job recovery (uses PipelineClient)
- Delegates all processing to workers
- Queries worker for job status
CLI Commands
CLI commands execute immediately without job recovery to prevent conflicts with concurrent usage.
recoverJobs: falsein PipelineFactory- Immediate execution model
- Safe for concurrent CLI operations
- No persistent job state
src/pipeline/PipelineFactory.ts
Container Deployment
Single Container
Simple deployment for unified server mode:Multi-Container (Docker Compose)
Distributed deployment with separate coordinator and workers:Kubernetes Deployment
Scalable deployment with multiple workers:Load Balancing
Multiple Workers
Use a load balancer or DNS round-robin in front of multiple worker instances: Configuration:Health Checks
Workers can expose health endpoints for monitoring:Scaling Strategies
Horizontal
Add more worker containers based on queue depth
Vertical
Increase worker CPU/memory allocation
Hybrid
Combine both strategies for optimal scaling
- Add workers when queue depth exceeds threshold
- Remove workers when idle
- Auto-scaling based on metrics
- Increase concurrency limit per worker
- Allocate more memory for large documents
- Faster embedding generation with GPU
Next Steps
Pipeline System
Learn about job processing architecture
Configuration
Configure deployment settings
