Metrics and Monitoring
Spacebot exposes comprehensive Prometheus metrics for monitoring LLM usage, tool execution, memory operations, and system health. All telemetry is optional and compiles to zero runtime cost when disabled.Feature Gate
All telemetry code is behind themetrics cargo feature flag. Without it, every instrumentation block compiles out to nothing.
[metrics] config block is always parsed for validation but has no effect without the feature enabled.
You can safely include metrics configuration in your
config.toml even when the feature is disabled. The config will validate but won’t activate any instrumentation.Endpoints
The metrics server binds to a configurable address (default0.0.0.0:9090), separate from the main API server (127.0.0.1:19898).
| Path | Response |
|---|---|
/metrics | Prometheus text exposition format (0.0.4) |
/health | 200 OK (liveness probe) |
Metric Inventory
All metrics are prefixed withspacebot_. The registry uses a private prometheus::Registry to avoid conflicts with other libraries.
Counters
spacebot_llm_requests_total
Total LLM completion requests, including retries and fallbacks.
- Type:
IntCounterVec - Labels:
agent_id,model,tier - Instrumented in:
src/llm/model.rs—SpacebotModel::completion() - Cardinality: ~25–375 series (agents × models × tiers)
completion() call. Use this to track request volume and identify which models your agents use most.
spacebot_tool_calls_total
Total tool calls executed across all processes.
- Type:
IntCounterVec - Labels:
agent_id,tool_name - Instrumented in:
src/hooks/spacebot.rs—SpacebotHook::on_tool_result() - Cardinality: ~20–100 series (agents × tools)
spacebot_memory_reads_total
Total successful memory recall (search) operations.
- Type:
IntCounter(no labels) - Instrumented in:
src/tools/memory_recall.rs—MemoryRecallTool::call() - Cardinality: 1 series
spacebot_memory_writes_total
Total successful memory save operations.
- Type:
IntCounter(no labels) - Instrumented in:
src/tools/memory_save.rs—MemorySaveTool::call() - Cardinality: 1 series
spacebot_llm_tokens_total
Total LLM tokens consumed, broken down by direction.
- Type:
IntCounterVec - Labels:
agent_id,model,tier,direction - Instrumented in:
src/llm/model.rs—SpacebotModel::completion() - Cardinality: ~75–1125 series (agents × models × tiers × 3)
direction label is one of:
input— prompt tokens sent to the modeloutput— completion tokens generated by the modelcached_input— input tokens served from cache (Anthropic only)
spacebot_llm_estimated_cost_dollars
Estimated LLM cost in USD based on a built-in pricing table.
- Type:
CounterVec(f64) - Labels:
agent_id,model,tier - Instrumented in:
src/llm/model.rs—SpacebotModel::completion() - Cardinality: ~25–375 series
Costs are best-effort estimates. The pricing table covers major models (Claude 4/3.5/3, GPT-4o, o-series, Gemini, DeepSeek) with a conservative fallback for unknown models (15/M output).
spacebot_process_errors_total
Process errors by type, tracking failures across all LLM requests.
- Type:
IntCounterVec - Labels:
agent_id,process_type,error_type - Instrumented in:
src/llm/model.rs—SpacebotModel::completion()error paths - Cardinality: ~15–75 series
error_type label classifies failures:
timeout— request exceeded deadlinerate_limit— 429 response from providerauth— authentication failureserver— 5xx response from providerprovider— provider-specific errorunknown— unclassified error
spacebot_memory_updates_total
Memory mutation operations across all agents.
- Type:
IntCounterVec - Labels:
agent_id,operation - Instrumented in:
src/memory/store.rs,src/tools/memory_save.rs,src/tools/memory_delete.rs - Cardinality: ~3–15 series (agents × 3 operations)
operation label is one of save, delete, or forget.
Histograms
spacebot_llm_request_duration_seconds
End-to-end LLM request duration, including retry loops and fallback chain traversal.
- Type:
HistogramVec - Labels:
agent_id,model,tier - Buckets: 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 15, 30, 60, 120
- Instrumented in:
src/llm/model.rs—SpacebotModel::completion() - Cardinality: ~25–375 series
spacebot_tool_call_duration_seconds
Tool call execution duration across all tools.
- Type:
Histogram(no labels) - Buckets: 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30
- Instrumented in:
src/hooks/spacebot.rs— timer starts inon_tool_call(), observed inon_tool_result() - Cardinality: 1 series
Duration is tracked via a
LazyLock<Mutex<HashMap<String, Instant>>> static keyed by Rig’s internal call ID. If a tool call starts but the agent terminates before on_tool_result fires, the timer entry remains. This is bounded by concurrent tool calls and not a practical concern.spacebot_worker_duration_seconds
Worker lifetime duration from spawn to completion.
- Type:
HistogramVec - Labels:
agent_id,worker_type - Buckets: 1, 5, 10, 30, 60, 120, 300, 600, 1800
- Instrumented in:
src/agent/channel.rs—spawn_worker_task() - Cardinality: ~1–5 series
worker_type is always "builtin". Future worker backends (OpenCode, MCP) will add additional types.
Gauges
spacebot_active_workers
Currently active workers per agent.
- Type:
IntGaugeVec - Labels:
agent_id - Instrumented in:
src/agent/channel.rs—spawn_worker_task() - Cardinality: ~1–5 series
spacebot_memory_entry_count
Approximate memory entry count per agent.
- Type:
IntGaugeVec - Labels:
agent_id - Instrumented in:
src/memory/store.rs—save()(inc) anddelete()(dec) - Cardinality: ~1–5 series
This gauge tracks deltas from process start, not the absolute database count. On restart it resets to 0. For the true count, query the database directly.
spacebot_active_branches
Currently active branches per agent.
- Type:
IntGaugeVec - Labels:
agent_id - Instrumented in:
src/agent/channel.rs— branch spawn (inc) and completion (dec) - Cardinality: ~1–5 series
Total Cardinality
With 1–5 agents, 5–15 models, and ~20 tools, you can expect:| Metric | Series estimate |
|---|---|
llm_requests_total | ~25–375 |
llm_tokens_total | ~75–1125 |
llm_estimated_cost_dollars | ~25–375 |
tool_calls_total | ~20–100 |
memory_reads_total | 1 |
memory_writes_total | 1 |
llm_request_duration_seconds | ~25–375 |
tool_call_duration_seconds | 1 |
worker_duration_seconds | ~1–5 |
active_workers | ~1–5 |
active_branches | ~1–5 |
memory_entry_count | ~1–5 |
process_errors_total | ~15–75 |
memory_updates_total | ~3–15 |
| Total | ~195–2465 |
Configuration
Enable metrics in yourconfig.toml:
/metrics endpoint from your Prometheus server:
Feature Gate Consistency
Every instrumentation call site uses#[cfg(feature = "metrics")] at the statement or block level. No path references crate::telemetry without a cfg gate.
| File | Gate type |
|---|---|
src/lib.rs | #[cfg(feature = "metrics")] pub mod telemetry |
src/main.rs | #[cfg(feature = "metrics")] let _metrics_handle = ... |
src/llm/model.rs | #[cfg(feature = "metrics")] let start + blocks |
src/hooks/spacebot.rs | #[cfg(feature = "metrics")] static TOOL_CALL_TIMERS + blocks |
src/tools/memory_*.rs | #[cfg(feature = "metrics")] crate::telemetry::Metrics::global()... |
src/memory/store.rs | #[cfg(feature = "metrics")] blocks |
src/agent/channel.rs | #[cfg(feature = "metrics")] (×4, branches + workers) |
Cargo.toml | prometheus = { version = "0.13", optional = true } |