Skip to main content

Overview

The MCP Documentation Server implements privacy-first telemetry to understand usage patterns, monitor performance, and improve user experience. The system is designed with user privacy as the primary concern while providing valuable insights for product development.
Telemetry is enabled by default but can be easily disabled. All data collection is privacy-focused and never includes sensitive information like URLs, document content, search queries, or authentication tokens.

Core Principles

Privacy First

  • No Sensitive Data: Never collects URLs, document content, search queries, or authentication tokens
  • Metadata Only: Tracks counts, durations, success/failure states, and performance metrics
  • Data Sanitization: Built-in utilities ensure no personally identifiable information is collected
  • User Control: Simple opt-out mechanisms via CLI flags and environment variables

Minimal Performance Impact

  • Synchronous Design: Simple, lightweight telemetry with minimal overhead
  • Graceful Degradation: System continues functioning normally when telemetry fails
  • No Dependencies: Core application never depends on telemetry functionality
  • Installation ID Only: Uses persistent UUID for consistent analytics without user tracking

Simple Architecture

  • Direct Analytics: Direct PostHog integration with installation ID as distinct user
  • Global Context: Application-level properties automatically included in all events
  • Easy Integration: Simple analytics interface for application components

What Data Is Collected

Global Context (All Events)

Automatically included in every event:
  • Application Metadata: Version, platform, Node.js version
  • Service Configuration: Enabled services, read-only mode, authentication status
  • AI Configuration: Embedding provider, model name, dimensions

Event Types

The system tracks these essential event types:
  • app_started - Application startup with service configuration
  • app_shutdown - Application shutdown (graceful vs forced)
Properties:
  • Services enabled, port, MCP protocol/transport
  • External worker configuration
  • CLI command (if started via CLI)
  • cli_command - CLI command execution and outcomes
Properties:
  • Command name (e.g., “scrape”, “search”)
  • Success/failure status
  • Execution duration
  • tool_used - Individual MCP tool execution
Properties:
  • Tool name (e.g., “search_docs”, “scrape_docs”)
  • Success/failure status
  • Execution duration
  • pipeline_job_started - Pipeline job initiation
  • pipeline_job_completed - Successful job completion
  • pipeline_job_failed - Job failures
Properties:
  • Job ID (anonymous correlation)
  • Library name
  • Version specified (yes/no)
  • Max pages configured
  • Queue wait time
  • Processing duration
  • Pages processed
  • Throughput (pages/second)
  • Sanitized error messages (failures only)
  • PostHog’s native exception tracking with full stack traces
Properties:
  • Stack traces with source code integration
  • Automatic error grouping
  • Component identification
  • Contextual information

Privacy-Safe Data Collection

The system ensures privacy through comprehensive data sanitization:

URL and Path Sanitization

  • Hostname Extraction: Only domain names, never paths or parameters
  • Protocol Detection: File vs web URLs, no sensitive paths
  • Error Sanitization: Removes sensitive paths and tokens from error messages
// Input: https://docs.example.com/internal/secret-project/api.html
// Collected: docs.example.com

// Input: file:///home/user/documents/private.md
// Collected: file://

Search Query Analysis

  • Query Characteristics: Length, word count, special characters
  • Pattern Detection: Question vs keyword search
  • No Content Storage: Never stores actual search terms

CLI Flag Extraction

  • Usage Patterns: Which flags are used
  • No Values: Never collects flag values or arguments

Telemetry Properties Reference

Complete reference of all collected properties:
PropertyTypeScopeEventsDescription
Global Context
appVersionstringGlobalAll eventsApplication version from package.json
appPlatformstringGlobalAll eventsNode.js platform (darwin, linux, win32)
appNodeVersionstringGlobalAll eventsNode.js version
appServicesEnabledstring[]GlobalAll eventsList of enabled services
appAuthEnabledbooleanGlobalAll eventsWhether authentication is configured
appReadOnlybooleanGlobalAll eventsWhether app is in read-only mode
aiEmbeddingProviderstringGlobalAll eventsProvider: “openai”, “google”, “aws”, “microsoft”
aiEmbeddingModelstringGlobalAll eventsModel name: “text-embedding-3-small”, etc.
aiEmbeddingDimensionsnumberGlobalAll eventsEmbedding dimensions used
Event Context
timestampISO8601EventAll eventsEvent timestamp (automatically added)
Application Lifecycle
servicesstring[]EventAPP_STARTEDList of enabled services
portnumberEventAPP_STARTEDServer port number
externalWorkerbooleanEventAPP_STARTEDWhether external worker is configured
cliCommandstringEventAPP_STARTEDCLI command name (when started via CLI)
mcpProtocolenumEventAPP_STARTEDMCP protocol: “stdio” or “http”
mcpTransportenumEventAPP_STARTEDMCP transport: “sse” or “streamable”
gracefulbooleanEventAPP_SHUTDOWNWhether shutdown was graceful
CLI Commands
cliCommandstringEventCLI_COMMANDCLI command name being executed
successbooleanEventCLI_COMMANDWhether command execution succeeded
durationMsnumberEventCLI_COMMANDCommand execution duration
Tool Usage
toolstringEventTOOL_USEDTool name being executed
successbooleanEventTOOL_USEDWhether tool execution succeeded
durationMsnumberEventTOOL_USEDTool execution duration
Pipeline Jobs
jobIdstringEventPIPELINE_JOB_*Anonymous job identifier for correlation
librarystringEventPIPELINE_JOB_*Library being processed
hasVersionbooleanEventPIPELINE_JOB_*Whether library version was specified
maxPagesConfigurednumberEventPIPELINE_JOB_*Maximum pages configured
queueWaitTimeMsnumberEventJOB_STARTEDTime job waited in queue before starting
durationMsnumberEventJOB_COMPLETEDJob execution duration
pagesProcessednumberEventJOB_COMPLETEDTotal pages successfully processed
throughputPagesPerSecondnumberEventJOB_COMPLETEDProcessing throughput
hasErrorbooleanEventJOB_FAILEDWhether job had errors (always true)
errorMessagestringEventJOB_FAILEDSanitized error message

Configuration and Control

Telemetry opt-in/out is part of appConfig.app.telemetryEnabled with precedence: defaults → docs-mcp.config.yaml (or DOCS_MCP_CONFIG) → legacy envs → generic env DOCS_MCP_<KEY> → CLI flags for the current run.
# Disable telemetry for current session
npx docs-mcp-server --no-telemetry

Installation ID

The system uses a persistent installation identifier for consistent analytics:
  • Storage: UUID-based identifier stored in installation.id
  • Location: Standard user data directory (~/.local/share/docs-mcp-server/)
  • Customization: Override with DOCS_MCP_STORE_PATH (useful for containers)
  • Purpose: Consistent identification across runs without user tracking
  • Privacy: Never transmitted outside telemetry events, fully under user control
The installation ID is generated once and reused across application runs. It provides consistent analytics without tracking individual users. You can delete this file at any time to generate a new ID.

Runtime Behavior

Graceful Degradation

  • Telemetry failures never affect application functionality
  • Simple fallback to no-op behavior when disabled
  • No crashes or errors from telemetry issues

Performance Impact

  • Synchronous event tracking with minimal overhead
  • No blocking operations or network delays
  • Lightweight data structures

Privacy Compliance

Data Minimization

The system implements strict data minimization:
  • Only essential data collection for core insights
  • Installation ID as the only persistent identifier
  • No user tracking or cross-session correlation beyond installation
  • Minimal data retention with focus on current patterns

Transparency

Users have clear control and visibility:
  • Simple opt-out mechanisms
  • Clear documentation of collected data types
  • No hidden or complex data collection
  • Installation ID stored locally and under user control

Security

Telemetry data protection:
  • Encrypted transmission to analytics service
  • No sensitive local storage beyond installation ID
  • Simple UUID-based identification system
  • Essential data sanitization to prevent information leakage

Analytics and Insights

The telemetry system provides valuable insights for product development:

Usage Analytics

  • Tool usage patterns across different interfaces
  • Feature adoption trends
  • Session frequency and engagement metrics

Performance Monitoring

  • Error rates and common failure patterns
  • Processing performance metrics
  • System stability and reliability trends

Product Intelligence

  • Interface preference trends (CLI vs MCP vs Web)
  • Tool popularity and usage patterns
  • Error categorization for improvement priorities
All analytics are aggregated and analyzed in a privacy-preserving manner. Individual installation patterns are used only to improve the product, never for user tracking or profiling.

Build docs developers (and LLMs) love