Metrics and Monitoring

Spacebot exposes comprehensive Prometheus metrics for monitoring LLM usage, tool execution, memory operations, and system health. All telemetry is optional and compiles to zero runtime cost when disabled.

Feature Gate

All telemetry code is behind the metrics cargo feature flag. Without it, every instrumentation block compiles out to nothing.

cargo build --release --features metrics

The [metrics] config block is always parsed for validation but has no effect without the feature enabled.

You can safely include metrics configuration in your config.toml even when the feature is disabled. The config will validate but won’t activate any instrumentation.

Endpoints

The metrics server binds to a configurable address (default 0.0.0.0:9090), separate from the main API server (127.0.0.1:19898).

Path	Response
`/metrics`	Prometheus text exposition format (0.0.4)
`/health`	`200 OK` (liveness probe)

Metric Inventory

All metrics are prefixed with spacebot_. The registry uses a private prometheus::Registry to avoid conflicts with other libraries.

Counters

`spacebot_llm_requests_total`

Total LLM completion requests, including retries and fallbacks.

Type: IntCounterVec
Labels: agent_id, model, tier
Instrumented in: src/llm/model.rs — SpacebotModel::completion()
Cardinality: ~25–375 series (agents × models × tiers)

Incremented once per completion() call. Use this to track request volume and identify which models your agents use most.

`spacebot_tool_calls_total`

Total tool calls executed across all processes.

Type: IntCounterVec
Labels: agent_id, tool_name
Instrumented in: src/hooks/spacebot.rs — SpacebotHook::on_tool_result()
Cardinality: ~20–100 series (agents × tools)

Incremented after each tool call completes, regardless of success or failure. Track which tools your agents rely on and identify patterns in tool usage.

`spacebot_memory_reads_total`

Total successful memory recall (search) operations.

Type: IntCounter (no labels)
Instrumented in: src/tools/memory_recall.rs — MemoryRecallTool::call()
Cardinality: 1 series

`spacebot_memory_writes_total`

Total successful memory save operations.

Type: IntCounter (no labels)
Instrumented in: src/tools/memory_save.rs — MemorySaveTool::call()
Cardinality: 1 series

`spacebot_llm_tokens_total`

Total LLM tokens consumed, broken down by direction.

Type: IntCounterVec
Labels: agent_id, model, tier, direction
Instrumented in: src/llm/model.rs — SpacebotModel::completion()
Cardinality: ~75–1125 series (agents × models × tiers × 3)

The direction label is one of:

input — prompt tokens sent to the model
output — completion tokens generated by the model
cached_input — input tokens served from cache (Anthropic only)

Use this to understand token consumption patterns and identify opportunities for optimization.

`spacebot_llm_estimated_cost_dollars`

Estimated LLM cost in USD based on a built-in pricing table.

Type: CounterVec (f64)
Labels: agent_id, model, tier
Instrumented in: src/llm/model.rs — SpacebotModel::completion()
Cardinality: ~25–375 series

Costs are best-effort estimates. The pricing table covers major models (Claude 4/3.5/3, GPT-4o, o-series, Gemini, DeepSeek) with a conservative fallback for unknown models (

3/M input,

15/M output).

`spacebot_process_errors_total`

Process errors by type, tracking failures across all LLM requests.

Type: IntCounterVec
Labels: agent_id, process_type, error_type
Instrumented in: src/llm/model.rs — SpacebotModel::completion() error paths
Cardinality: ~15–75 series

The error_type label classifies failures:

timeout — request exceeded deadline
rate_limit — 429 response from provider
auth — authentication failure
server — 5xx response from provider
provider — provider-specific error
unknown — unclassified error

`spacebot_memory_updates_total`

Memory mutation operations across all agents.

Type: IntCounterVec
Labels: agent_id, operation
Instrumented in: src/memory/store.rs, src/tools/memory_save.rs, src/tools/memory_delete.rs
Cardinality: ~3–15 series (agents × 3 operations)

The operation label is one of save, delete, or forget.

Histograms

`spacebot_llm_request_duration_seconds`

End-to-end LLM request duration, including retry loops and fallback chain traversal.

Type: HistogramVec
Labels: agent_id, model, tier
Buckets: 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 15, 30, 60, 120
Instrumented in: src/llm/model.rs — SpacebotModel::completion()
Cardinality: ~25–375 series

Use this to identify slow models, track latency percentiles, and detect performance regressions.

`spacebot_tool_call_duration_seconds`

Tool call execution duration across all tools.

Type: Histogram (no labels)
Buckets: 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30
Instrumented in: src/hooks/spacebot.rs — timer starts in on_tool_call(), observed in on_tool_result()
Cardinality: 1 series

Duration is tracked via a LazyLock<Mutex<HashMap<String, Instant>>> static keyed by Rig’s internal call ID. If a tool call starts but the agent terminates before on_tool_result fires, the timer entry remains. This is bounded by concurrent tool calls and not a practical concern.

`spacebot_worker_duration_seconds`

Worker lifetime duration from spawn to completion.

Type: HistogramVec
Labels: agent_id, worker_type
Buckets: 1, 5, 10, 30, 60, 120, 300, 600, 1800
Instrumented in: src/agent/channel.rs — spawn_worker_task()
Cardinality: ~1–5 series

Currently worker_type is always "builtin". Future worker backends (OpenCode, MCP) will add additional types.

Gauges

`spacebot_active_workers`

Currently active workers per agent.

Type: IntGaugeVec
Labels: agent_id
Instrumented in: src/agent/channel.rs — spawn_worker_task()
Cardinality: ~1–5 series

Incremented when a worker task spawns, decremented when it completes. Use this to monitor concurrency and identify agents with long-running workers.

`spacebot_memory_entry_count`

Approximate memory entry count per agent.

Type: IntGaugeVec
Labels: agent_id
Instrumented in: src/memory/store.rs — save() (inc) and delete() (dec)
Cardinality: ~1–5 series

This gauge tracks deltas from process start, not the absolute database count. On restart it resets to 0. For the true count, query the database directly.

`spacebot_active_branches`

Currently active branches per agent.

Type: IntGaugeVec
Labels: agent_id
Instrumented in: src/agent/channel.rs — branch spawn (inc) and completion (dec)
Cardinality: ~1–5 series

Total Cardinality

With 1–5 agents, 5–15 models, and ~20 tools, you can expect:

Metric	Series estimate
`llm_requests_total`	~25–375
`llm_tokens_total`	~75–1125
`llm_estimated_cost_dollars`	~25–375
`tool_calls_total`	~20–100
`memory_reads_total`	1
`memory_writes_total`	1
`llm_request_duration_seconds`	~25–375
`tool_call_duration_seconds`	1
`worker_duration_seconds`	~1–5
`active_workers`	~1–5
`active_branches`	~1–5
`memory_entry_count`	~1–5
`process_errors_total`	~15–75
`memory_updates_total`	~3–15
Total	~195–2465

This is well within safe operating range for any Prometheus deployment.

Configuration

Enable metrics in your config.toml:

[metrics]
enabled = true
bind = "0.0.0.0:9090"

Then scrape the /metrics endpoint from your Prometheus server:

scrape_configs:
  - job_name: 'spacebot'
    static_configs:
      - targets: ['localhost:9090']

Feature Gate Consistency

Every instrumentation call site uses #[cfg(feature = "metrics")] at the statement or block level. No path references crate::telemetry without a cfg gate.

File	Gate type
`src/lib.rs`	`#[cfg(feature = "metrics")] pub mod telemetry`
`src/main.rs`	`#[cfg(feature = "metrics")] let _metrics_handle = ...`
`src/llm/model.rs`	`#[cfg(feature = "metrics")] let start` + blocks
`src/hooks/spacebot.rs`	`#[cfg(feature = "metrics")] static TOOL_CALL_TIMERS` + blocks
`src/tools/memory_*.rs`	`#[cfg(feature = "metrics")] crate::telemetry::Metrics::global()...`
`src/memory/store.rs`	`#[cfg(feature = "metrics")]` blocks
`src/agent/channel.rs`	`#[cfg(feature = "metrics")]` (×4, branches + workers)
`Cargo.toml`	`prometheus = { version = "0.13", optional = true }`

All consistent. The feature compiles to zero cost when disabled.

Getting Started

Core Concepts

Features

Configuration

Messaging

Deployment

Metrics and Monitoring

Metrics and Monitoring

Feature Gate

Endpoints

Metric Inventory

Counters

`spacebot_llm_requests_total`

`spacebot_tool_calls_total`

`spacebot_memory_reads_total`

`spacebot_memory_writes_total`

`spacebot_llm_tokens_total`

`spacebot_llm_estimated_cost_dollars`

`spacebot_process_errors_total`

`spacebot_memory_updates_total`

Histograms

`spacebot_llm_request_duration_seconds`

`spacebot_tool_call_duration_seconds`

`spacebot_worker_duration_seconds`

Gauges

`spacebot_active_workers`

`spacebot_memory_entry_count`

`spacebot_active_branches`

Total Cardinality

Configuration

Feature Gate Consistency

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Features

Configuration

Messaging

Deployment

​Metrics and Monitoring

​Feature Gate

​Endpoints

​Metric Inventory

​Counters

​spacebot_llm_requests_total

​spacebot_tool_calls_total

​spacebot_memory_reads_total

​spacebot_memory_writes_total

​spacebot_llm_tokens_total

​spacebot_llm_estimated_cost_dollars

​spacebot_process_errors_total

​spacebot_memory_updates_total

​Histograms

​spacebot_llm_request_duration_seconds

​spacebot_tool_call_duration_seconds

​spacebot_worker_duration_seconds

​Gauges

​spacebot_active_workers

​spacebot_memory_entry_count

​spacebot_active_branches

​Total Cardinality

​Configuration

​Feature Gate Consistency

Build docs developers (and LLMs) love

Metrics and Monitoring

Feature Gate

Endpoints

Metric Inventory

Counters

`spacebot_llm_requests_total`

`spacebot_tool_calls_total`

`spacebot_memory_reads_total`

`spacebot_memory_writes_total`

`spacebot_llm_tokens_total`

`spacebot_llm_estimated_cost_dollars`

`spacebot_process_errors_total`

`spacebot_memory_updates_total`

Histograms

`spacebot_llm_request_duration_seconds`

`spacebot_tool_call_duration_seconds`

`spacebot_worker_duration_seconds`

Gauges

`spacebot_active_workers`

`spacebot_memory_entry_count`

`spacebot_active_branches`

Total Cardinality

Configuration

Feature Gate Consistency