Overview
The MCP Documentation Server implements privacy-first telemetry to understand usage patterns, monitor performance, and improve user experience. The system is designed with user privacy as the primary concern while providing valuable insights for product development.Telemetry is enabled by default but can be easily disabled. All data collection is privacy-focused and never includes sensitive information like URLs, document content, search queries, or authentication tokens.
Core Principles
Privacy First
- No Sensitive Data: Never collects URLs, document content, search queries, or authentication tokens
- Metadata Only: Tracks counts, durations, success/failure states, and performance metrics
- Data Sanitization: Built-in utilities ensure no personally identifiable information is collected
- User Control: Simple opt-out mechanisms via CLI flags and environment variables
Minimal Performance Impact
- Synchronous Design: Simple, lightweight telemetry with minimal overhead
- Graceful Degradation: System continues functioning normally when telemetry fails
- No Dependencies: Core application never depends on telemetry functionality
- Installation ID Only: Uses persistent UUID for consistent analytics without user tracking
Simple Architecture
- Direct Analytics: Direct PostHog integration with installation ID as distinct user
- Global Context: Application-level properties automatically included in all events
- Easy Integration: Simple analytics interface for application components
What Data Is Collected
Global Context (All Events)
Automatically included in every event:- Application Metadata: Version, platform, Node.js version
- Service Configuration: Enabled services, read-only mode, authentication status
- AI Configuration: Embedding provider, model name, dimensions
Event Types
The system tracks these essential event types:Application Lifecycle
Application Lifecycle
app_started- Application startup with service configurationapp_shutdown- Application shutdown (graceful vs forced)
- Services enabled, port, MCP protocol/transport
- External worker configuration
- CLI command (if started via CLI)
CLI Commands
CLI Commands
cli_command- CLI command execution and outcomes
- Command name (e.g., “scrape”, “search”)
- Success/failure status
- Execution duration
Tool Usage
Tool Usage
tool_used- Individual MCP tool execution
- Tool name (e.g., “search_docs”, “scrape_docs”)
- Success/failure status
- Execution duration
Pipeline Jobs
Pipeline Jobs
pipeline_job_started- Pipeline job initiationpipeline_job_completed- Successful job completionpipeline_job_failed- Job failures
- Job ID (anonymous correlation)
- Library name
- Version specified (yes/no)
- Max pages configured
- Queue wait time
- Processing duration
- Pages processed
- Throughput (pages/second)
- Sanitized error messages (failures only)
Error Tracking
Error Tracking
- PostHog’s native exception tracking with full stack traces
- Stack traces with source code integration
- Automatic error grouping
- Component identification
- Contextual information
Privacy-Safe Data Collection
The system ensures privacy through comprehensive data sanitization:URL and Path Sanitization
- Hostname Extraction: Only domain names, never paths or parameters
- Protocol Detection: File vs web URLs, no sensitive paths
- Error Sanitization: Removes sensitive paths and tokens from error messages
Search Query Analysis
- Query Characteristics: Length, word count, special characters
- Pattern Detection: Question vs keyword search
- No Content Storage: Never stores actual search terms
CLI Flag Extraction
- Usage Patterns: Which flags are used
- No Values: Never collects flag values or arguments
Telemetry Properties Reference
Complete reference of all collected properties:| Property | Type | Scope | Events | Description |
|---|---|---|---|---|
| Global Context | ||||
appVersion | string | Global | All events | Application version from package.json |
appPlatform | string | Global | All events | Node.js platform (darwin, linux, win32) |
appNodeVersion | string | Global | All events | Node.js version |
appServicesEnabled | string[] | Global | All events | List of enabled services |
appAuthEnabled | boolean | Global | All events | Whether authentication is configured |
appReadOnly | boolean | Global | All events | Whether app is in read-only mode |
aiEmbeddingProvider | string | Global | All events | Provider: “openai”, “google”, “aws”, “microsoft” |
aiEmbeddingModel | string | Global | All events | Model name: “text-embedding-3-small”, etc. |
aiEmbeddingDimensions | number | Global | All events | Embedding dimensions used |
| Event Context | ||||
timestamp | ISO8601 | Event | All events | Event timestamp (automatically added) |
| Application Lifecycle | ||||
services | string[] | Event | APP_STARTED | List of enabled services |
port | number | Event | APP_STARTED | Server port number |
externalWorker | boolean | Event | APP_STARTED | Whether external worker is configured |
cliCommand | string | Event | APP_STARTED | CLI command name (when started via CLI) |
mcpProtocol | enum | Event | APP_STARTED | MCP protocol: “stdio” or “http” |
mcpTransport | enum | Event | APP_STARTED | MCP transport: “sse” or “streamable” |
graceful | boolean | Event | APP_SHUTDOWN | Whether shutdown was graceful |
| CLI Commands | ||||
cliCommand | string | Event | CLI_COMMAND | CLI command name being executed |
success | boolean | Event | CLI_COMMAND | Whether command execution succeeded |
durationMs | number | Event | CLI_COMMAND | Command execution duration |
| Tool Usage | ||||
tool | string | Event | TOOL_USED | Tool name being executed |
success | boolean | Event | TOOL_USED | Whether tool execution succeeded |
durationMs | number | Event | TOOL_USED | Tool execution duration |
| Pipeline Jobs | ||||
jobId | string | Event | PIPELINE_JOB_* | Anonymous job identifier for correlation |
library | string | Event | PIPELINE_JOB_* | Library being processed |
hasVersion | boolean | Event | PIPELINE_JOB_* | Whether library version was specified |
maxPagesConfigured | number | Event | PIPELINE_JOB_* | Maximum pages configured |
queueWaitTimeMs | number | Event | JOB_STARTED | Time job waited in queue before starting |
durationMs | number | Event | JOB_COMPLETED | Job execution duration |
pagesProcessed | number | Event | JOB_COMPLETED | Total pages successfully processed |
throughputPagesPerSecond | number | Event | JOB_COMPLETED | Processing throughput |
hasError | boolean | Event | JOB_FAILED | Whether job had errors (always true) |
errorMessage | string | Event | JOB_FAILED | Sanitized error message |
Configuration and Control
Telemetry opt-in/out is part ofappConfig.app.telemetryEnabled with precedence: defaults → docs-mcp.config.yaml (or DOCS_MCP_CONFIG) → legacy envs → generic env DOCS_MCP_<KEY> → CLI flags for the current run.
Installation ID
The system uses a persistent installation identifier for consistent analytics:- Storage: UUID-based identifier stored in
installation.id - Location: Standard user data directory (
~/.local/share/docs-mcp-server/) - Customization: Override with
DOCS_MCP_STORE_PATH(useful for containers) - Purpose: Consistent identification across runs without user tracking
- Privacy: Never transmitted outside telemetry events, fully under user control
The installation ID is generated once and reused across application runs. It provides consistent analytics without tracking individual users. You can delete this file at any time to generate a new ID.
Runtime Behavior
Graceful Degradation
- Telemetry failures never affect application functionality
- Simple fallback to no-op behavior when disabled
- No crashes or errors from telemetry issues
Performance Impact
- Synchronous event tracking with minimal overhead
- No blocking operations or network delays
- Lightweight data structures
Privacy Compliance
Data Minimization
The system implements strict data minimization:- Only essential data collection for core insights
- Installation ID as the only persistent identifier
- No user tracking or cross-session correlation beyond installation
- Minimal data retention with focus on current patterns
Transparency
Users have clear control and visibility:- Simple opt-out mechanisms
- Clear documentation of collected data types
- No hidden or complex data collection
- Installation ID stored locally and under user control
Security
Telemetry data protection:- Encrypted transmission to analytics service
- No sensitive local storage beyond installation ID
- Simple UUID-based identification system
- Essential data sanitization to prevent information leakage
Analytics and Insights
The telemetry system provides valuable insights for product development:Usage Analytics
- Tool usage patterns across different interfaces
- Feature adoption trends
- Session frequency and engagement metrics
Performance Monitoring
- Error rates and common failure patterns
- Processing performance metrics
- System stability and reliability trends
Product Intelligence
- Interface preference trends (CLI vs MCP vs Web)
- Tool popularity and usage patterns
- Error categorization for improvement priorities
All analytics are aggregated and analyzed in a privacy-preserving manner. Individual installation patterns are used only to improve the product, never for user tracking or profiling.
