Skip to main content

Metrics and Monitoring

Spacebot exposes comprehensive Prometheus metrics for monitoring LLM usage, tool execution, memory operations, and system health. All telemetry is optional and compiles to zero runtime cost when disabled.

Feature Gate

All telemetry code is behind the metrics cargo feature flag. Without it, every instrumentation block compiles out to nothing.
cargo build --release --features metrics
The [metrics] config block is always parsed for validation but has no effect without the feature enabled.
You can safely include metrics configuration in your config.toml even when the feature is disabled. The config will validate but won’t activate any instrumentation.

Endpoints

The metrics server binds to a configurable address (default 0.0.0.0:9090), separate from the main API server (127.0.0.1:19898).
PathResponse
/metricsPrometheus text exposition format (0.0.4)
/health200 OK (liveness probe)

Metric Inventory

All metrics are prefixed with spacebot_. The registry uses a private prometheus::Registry to avoid conflicts with other libraries.

Counters

spacebot_llm_requests_total

Total LLM completion requests, including retries and fallbacks.
  • Type: IntCounterVec
  • Labels: agent_id, model, tier
  • Instrumented in: src/llm/model.rsSpacebotModel::completion()
  • Cardinality: ~25–375 series (agents × models × tiers)
Incremented once per completion() call. Use this to track request volume and identify which models your agents use most.

spacebot_tool_calls_total

Total tool calls executed across all processes.
  • Type: IntCounterVec
  • Labels: agent_id, tool_name
  • Instrumented in: src/hooks/spacebot.rsSpacebotHook::on_tool_result()
  • Cardinality: ~20–100 series (agents × tools)
Incremented after each tool call completes, regardless of success or failure. Track which tools your agents rely on and identify patterns in tool usage.

spacebot_memory_reads_total

Total successful memory recall (search) operations.
  • Type: IntCounter (no labels)
  • Instrumented in: src/tools/memory_recall.rsMemoryRecallTool::call()
  • Cardinality: 1 series

spacebot_memory_writes_total

Total successful memory save operations.
  • Type: IntCounter (no labels)
  • Instrumented in: src/tools/memory_save.rsMemorySaveTool::call()
  • Cardinality: 1 series

spacebot_llm_tokens_total

Total LLM tokens consumed, broken down by direction.
  • Type: IntCounterVec
  • Labels: agent_id, model, tier, direction
  • Instrumented in: src/llm/model.rsSpacebotModel::completion()
  • Cardinality: ~75–1125 series (agents × models × tiers × 3)
The direction label is one of:
  • input — prompt tokens sent to the model
  • output — completion tokens generated by the model
  • cached_input — input tokens served from cache (Anthropic only)
Use this to understand token consumption patterns and identify opportunities for optimization.

spacebot_llm_estimated_cost_dollars

Estimated LLM cost in USD based on a built-in pricing table.
  • Type: CounterVec (f64)
  • Labels: agent_id, model, tier
  • Instrumented in: src/llm/model.rsSpacebotModel::completion()
  • Cardinality: ~25–375 series
Costs are best-effort estimates. The pricing table covers major models (Claude 4/3.5/3, GPT-4o, o-series, Gemini, DeepSeek) with a conservative fallback for unknown models (3/Minput,3/M input, 15/M output).

spacebot_process_errors_total

Process errors by type, tracking failures across all LLM requests.
  • Type: IntCounterVec
  • Labels: agent_id, process_type, error_type
  • Instrumented in: src/llm/model.rsSpacebotModel::completion() error paths
  • Cardinality: ~15–75 series
The error_type label classifies failures:
  • timeout — request exceeded deadline
  • rate_limit — 429 response from provider
  • auth — authentication failure
  • server — 5xx response from provider
  • provider — provider-specific error
  • unknown — unclassified error

spacebot_memory_updates_total

Memory mutation operations across all agents.
  • Type: IntCounterVec
  • Labels: agent_id, operation
  • Instrumented in: src/memory/store.rs, src/tools/memory_save.rs, src/tools/memory_delete.rs
  • Cardinality: ~3–15 series (agents × 3 operations)
The operation label is one of save, delete, or forget.

Histograms

spacebot_llm_request_duration_seconds

End-to-end LLM request duration, including retry loops and fallback chain traversal.
  • Type: HistogramVec
  • Labels: agent_id, model, tier
  • Buckets: 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 15, 30, 60, 120
  • Instrumented in: src/llm/model.rsSpacebotModel::completion()
  • Cardinality: ~25–375 series
Use this to identify slow models, track latency percentiles, and detect performance regressions.

spacebot_tool_call_duration_seconds

Tool call execution duration across all tools.
  • Type: Histogram (no labels)
  • Buckets: 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30
  • Instrumented in: src/hooks/spacebot.rs — timer starts in on_tool_call(), observed in on_tool_result()
  • Cardinality: 1 series
Duration is tracked via a LazyLock<Mutex<HashMap<String, Instant>>> static keyed by Rig’s internal call ID. If a tool call starts but the agent terminates before on_tool_result fires, the timer entry remains. This is bounded by concurrent tool calls and not a practical concern.

spacebot_worker_duration_seconds

Worker lifetime duration from spawn to completion.
  • Type: HistogramVec
  • Labels: agent_id, worker_type
  • Buckets: 1, 5, 10, 30, 60, 120, 300, 600, 1800
  • Instrumented in: src/agent/channel.rsspawn_worker_task()
  • Cardinality: ~1–5 series
Currently worker_type is always "builtin". Future worker backends (OpenCode, MCP) will add additional types.

Gauges

spacebot_active_workers

Currently active workers per agent.
  • Type: IntGaugeVec
  • Labels: agent_id
  • Instrumented in: src/agent/channel.rsspawn_worker_task()
  • Cardinality: ~1–5 series
Incremented when a worker task spawns, decremented when it completes. Use this to monitor concurrency and identify agents with long-running workers.

spacebot_memory_entry_count

Approximate memory entry count per agent.
  • Type: IntGaugeVec
  • Labels: agent_id
  • Instrumented in: src/memory/store.rssave() (inc) and delete() (dec)
  • Cardinality: ~1–5 series
This gauge tracks deltas from process start, not the absolute database count. On restart it resets to 0. For the true count, query the database directly.

spacebot_active_branches

Currently active branches per agent.
  • Type: IntGaugeVec
  • Labels: agent_id
  • Instrumented in: src/agent/channel.rs — branch spawn (inc) and completion (dec)
  • Cardinality: ~1–5 series

Total Cardinality

With 1–5 agents, 5–15 models, and ~20 tools, you can expect:
MetricSeries estimate
llm_requests_total~25–375
llm_tokens_total~75–1125
llm_estimated_cost_dollars~25–375
tool_calls_total~20–100
memory_reads_total1
memory_writes_total1
llm_request_duration_seconds~25–375
tool_call_duration_seconds1
worker_duration_seconds~1–5
active_workers~1–5
active_branches~1–5
memory_entry_count~1–5
process_errors_total~15–75
memory_updates_total~3–15
Total~195–2465
This is well within safe operating range for any Prometheus deployment.

Configuration

Enable metrics in your config.toml:
[metrics]
enabled = true
bind = "0.0.0.0:9090"
Then scrape the /metrics endpoint from your Prometheus server:
scrape_configs:
  - job_name: 'spacebot'
    static_configs:
      - targets: ['localhost:9090']

Feature Gate Consistency

Every instrumentation call site uses #[cfg(feature = "metrics")] at the statement or block level. No path references crate::telemetry without a cfg gate.
FileGate type
src/lib.rs#[cfg(feature = "metrics")] pub mod telemetry
src/main.rs#[cfg(feature = "metrics")] let _metrics_handle = ...
src/llm/model.rs#[cfg(feature = "metrics")] let start + blocks
src/hooks/spacebot.rs#[cfg(feature = "metrics")] static TOOL_CALL_TIMERS + blocks
src/tools/memory_*.rs#[cfg(feature = "metrics")] crate::telemetry::Metrics::global()...
src/memory/store.rs#[cfg(feature = "metrics")] blocks
src/agent/channel.rs#[cfg(feature = "metrics")] (×4, branches + workers)
Cargo.tomlprometheus = { version = "0.13", optional = true }
All consistent. The feature compiles to zero cost when disabled.

Build docs developers (and LLMs) love