Introduction
The Secure MCP Gateway provides enterprise-grade observability through a plugin-based telemetry system. The architecture supports structured logging, distributed tracing, and metrics collection to give you complete visibility into gateway operations, security events, and performance characteristics.Telemetry Architecture
Plugin-Based System
The gateway uses a flexible plugin architecture for telemetry providers:Key Components
1. Telemetry Provider Interface
All providers implement the baseTelemetryProvider interface:
src/secure_mcp_gateway/plugins/telemetry/base.py
2. Telemetry Config Manager
Centralized singleton that manages the telemetry lifecycle:src/secure_mcp_gateway/plugins/telemetry/config_manager.py
3. Lazy Logger Pattern
To avoid circular imports during initialization, the gateway uses a lazy logger:src/secure_mcp_gateway/utils.py:63-92
Configuration
Telemetry Config Structure
Telemetry is configured inenkrypt_mcp_config.json:
Configuration Options
Telemetry provider to use. Available:
opentelemetry, stdoutEnable or disable telemetry. When disabled, uses no-op logger/tracer.
OTLP endpoint URL for the OpenTelemetry Collector (gRPC or HTTP)
Use insecure connection (no TLS). Set to
false for production with TLS.Service name for resource attributes in telemetry data
Job name for metrics scraping and resource attributes
Log level:
DEBUG, INFO, WARNING, ERROR, CRITICALAvailable Providers
OpenTelemetry Provider
Production-ready provider with full observability stack:- Structured Logging: JSON logs with context via
structlog - Distributed Tracing: Context propagation across service boundaries
- Metrics: Prometheus-compatible metrics export
- Export: OTLP gRPC/HTTP to OpenTelemetry Collector
- Integration: Works with Jaeger, Loki, Prometheus, Grafana
Stdout Provider
Simple provider for debugging and development:- Logs to stdout
- No tracing or metrics
- Minimal overhead
Observability Features
1. Structured Logging
All logs include contextual information:2. Distributed Tracing
Traces span across gateway operations:3. Metrics Collection
Comprehensive metrics for monitoring: Operation Metrics:enkrypt_tool_calls_total- Total tool invocationsenkrypt_tool_call_duration_seconds- Tool execution latency (histogram)enkrypt_tool_call_success_total- Successful tool callsenkrypt_tool_call_failure_total- Failed tool calls
enkrypt_cache_hits_total- Cache hitsenkrypt_cache_misses_total- Cache misses
enkrypt_guardrail_violations_total- Total guardrail violationsenkrypt_input_guardrail_violations_total- Input violationsenkrypt_output_guardrail_violations_total- Output violationsenkrypt_pii_redactions_total- PII redaction eventsenkrypt_tool_call_blocked_total- Blocked tool calls
enkrypt_auth_success_total- Successful authenticationsenkrypt_auth_failure_total- Failed authenticationsenkrypt_active_sessions- Current active sessions (gauge)enkrypt_active_users- Current active users (gauge)
Observability Stack Deployment
The gateway includes a complete observability stack via Docker Compose:| Service | Port | Purpose |
|---|---|---|
| OpenTelemetry Collector | 4317 (gRPC), 4318 (HTTP) | Receives telemetry data |
| Jaeger UI | 16686 | Trace visualization |
| Loki | 3100 | Log aggregation |
| Prometheus | 9090 | Metrics storage and querying |
| Grafana | 3000 | Unified dashboards |
- Grafana: http://localhost:3000 (anonymous admin access)
- Jaeger: http://localhost:16686
- Prometheus: http://localhost:9090
Telemetry Lifecycle
1. Initialization
Telemetry is initialized during gateway startup:src/secure_mcp_gateway/gateway.py:796
2. Connectivity Check
Before enabling telemetry, the provider checks endpoint reachability:src/secure_mcp_gateway/plugins/telemetry/opentelemetry_provider.py:152
3. No-Op Fallback
If telemetry is disabled or unreachable, the provider uses no-op implementations:src/secure_mcp_gateway/plugins/telemetry/opentelemetry_provider.py:486
Best Practices
Use Structured Logging
Use Structured Logging
Always include contextual information in logs:This enables powerful filtering and analysis in Loki/Grafana.
Set Span Attributes
Set Span Attributes
Add meaningful attributes to traces:This makes traces searchable and provides context.
Use Appropriate Log Levels
Use Appropriate Log Levels
DEBUG: Detailed diagnostic informationINFO: General operational eventsWARNING: Unexpected but handled situationsERROR: Error conditions that need attentionCRITICAL: Severe errors requiring immediate action
Monitor Key Metrics
Monitor Key Metrics
Set up alerts for:
- High error rates (
tool_call_failure_total) - Slow operations (
tool_call_duration_seconds) - Security events (
guardrail_violations_total) - Resource usage (
active_sessions,cache_misses_total)
Use TLS in Production
Use TLS in Production
Configure secure connections:
Performance Considerations
Minimal Overhead
The OpenTelemetry provider is designed for production:- Batch Processing: Logs, traces, and metrics are batched before export
- Async Export: Telemetry export doesn’t block gateway operations
- Configurable Buffer: 1s timeout, 1024 batch size (configurable)
- Periodic Metrics: Metrics exported every 5 seconds
Resource Usage
Typical resource consumption:- CPU: < 5% overhead for telemetry operations
- Memory: ~50-100MB for OpenTelemetry SDK and buffers
- Network: Depends on log volume and metric cardinality
Optimization Tips
- Adjust Log Level: Use
INFOorWARNINGin production,DEBUGonly when troubleshooting - Sample Traces: For high-volume deployments, use trace sampling in the collector
- Metric Cardinality: Avoid high-cardinality labels (e.g., user IDs in metric labels)
- Buffer Tuning: Adjust batch size and timeout in collector config
Troubleshooting
Telemetry Not Working
Check connectivity:Logs Not Appearing in Loki
- Check Loki is running:
curl http://localhost:3100/ready - Verify collector exports to Loki: Check
exporters.otlphttp/lokiin collector config - Check Grafana datasource: Navigate to Grafana → Connections → Loki
Metrics Not in Prometheus
- Check Prometheus scrape config:
http://localhost:9090/config - Verify collector metrics endpoint:
curl http://localhost:8889/metrics - Check scrape targets:
http://localhost:9090/targets
Traces Not in Jaeger
- Check Jaeger is running:
curl http://localhost:16686 - Verify collector exports to Jaeger: Check
exporters.otlpin collector config - Check Jaeger receives data: Navigate to Jaeger UI → Search
Next Steps
OpenTelemetry Setup
Configure OTLP export, distributed tracing, and collector setup
Metrics
Explore Prometheus metrics, Grafana dashboards, and alerting
Logging
Learn about log formats, levels, and aggregation with Loki
API Reference
Explore the REST API for programmatic monitoring