The Metrics struct provides thread-safe operational telemetry for monitoring OneClaw agent performance. All counters use AtomicU64 for lock-free increments.
Struct Definition
pub struct Metrics {
pub messages_total: AtomicU64,
pub messages_secured: AtomicU64,
pub messages_denied: AtomicU64,
pub messages_rate_limited: AtomicU64,
pub llm_calls_total: AtomicU64,
pub llm_calls_failed: AtomicU64,
pub llm_tokens_total: AtomicU64,
pub llm_latency_total_ms: AtomicU64,
pub memory_stores: AtomicU64,
pub memory_searches: AtomicU64,
pub tool_calls_total: AtomicU64,
pub tool_calls_failed: AtomicU64,
pub events_published: AtomicU64,
pub events_processed: AtomicU64,
pub alerts_triggered: AtomicU64,
pub chains_executed: AtomicU64,
pub chain_steps_total: AtomicU64,
pub errors_total: AtomicU64,
}
Available Metrics
Message Metrics
Total messages received across all channels
Messages that passed security authorization
Messages denied by security checks
Messages rejected by rate limiter (default: 60/min)
LLM Metrics
Total LLM API calls made to providers
LLM API calls that failed or timed out
Total tokens consumed across all LLM calls (input + output)
Cumulative LLM latency in milliseconds (use with llm_calls_total to calculate average)
Memory Metrics
Total memory store operations (“remember” command)
Total memory search operations (“recall” command, context retrieval)
Total tool execution calls
Tool executions that failed or returned errors
Event Metrics
Total events published to the event bus
Total events processed (drained) from the event bus
Total alerts triggered by event handlers
Chain Metrics
Total chain steps executed across all chains
Error Metrics
Total errors encountered across all operations
Methods
new
Create a new metrics instance with all counters at zero.
A new Metrics instance with all counters initialized to 0
Example:
use oneclaw_core::metrics::Metrics;
let metrics = Metrics::new();
assert_eq!(metrics.messages_total.load(Ordering::Relaxed), 0);
inc
Increment a counter by 1 (thread-safe).
pub fn inc(counter: &AtomicU64)
Example:
Metrics::inc(&runtime.metrics.messages_total);
Metrics::inc(&runtime.metrics.llm_calls_total);
add
Add a value to a counter (thread-safe).
pub fn add(counter: &AtomicU64, value: u64)
Example:
Metrics::add(&runtime.metrics.llm_tokens_total, 150);
Metrics::add(&runtime.metrics.llm_latency_total_ms, 250);
uptime_secs
Get uptime in seconds since boot.
pub fn uptime_secs(&self) -> u64
Example:
let uptime = runtime.metrics.uptime_secs();
println!("Agent uptime: {} seconds", uptime);
uptime_display
Get formatted uptime string (e.g., “2h 15m 30s”).
pub fn uptime_display(&self) -> String
Example:
println!("Uptime: {}", runtime.metrics.uptime_display());
// Output: "Uptime: 2h 15m 30s"
avg_llm_latency_ms
Calculate average LLM latency in milliseconds.
pub fn avg_llm_latency_ms(&self) -> u64
Average latency in ms, or 0 if no calls made
Example:
let avg_latency = runtime.metrics.avg_llm_latency_ms();
println!("Average LLM latency: {}ms", avg_latency);
report
Generate a formatted report of all metrics.
pub fn report(&self) -> String
Multi-line formatted report with all metric categories
Example:
println!("{}", runtime.metrics.report());
Output:
OneClaw Metrics:
Uptime: 2h 15m 30s
Messages:
Total: 145 | Secured: 142 | Denied: 3 | Rate-limited: 0
LLM:
Calls: 89 | Failed: 2 | Tokens: 15420 | Avg latency: 245ms
Memory:
Stores: 12 | Searches: 45
Tools:
Calls: 23 | Failed: 1
Events:
Published: 67 | Processed: 67 | Alerts: 5
Chains:
Executed: 8 | Steps: 24
Errors: 3
Accessing Metrics
Metrics are accessible through the Runtime struct:
let runtime = Runtime::from_config(config, workspace)?;
// Access metrics
println!("Total messages: {}",
runtime.metrics.messages_total.load(Ordering::Relaxed));
// Increment metrics
Metrics::inc(&runtime.metrics.messages_total);
// Generate report
println!("{}", runtime.metrics.report());
Real-time Monitoring
Spawn a background task to periodically report metrics:
use std::sync::Arc;
use std::time::Duration;
use tokio::time::interval;
let metrics = Arc::clone(&runtime.metrics);
tokio::spawn(async move {
let mut ticker = interval(Duration::from_secs(60));
loop {
ticker.tick().await;
println!("--- Metrics Report ---");
println!("{}", metrics.report());
}
});
Alerting on Thresholds
Monitor failure rates and trigger alerts:
use std::sync::atomic::Ordering;
let total_calls = runtime.metrics.llm_calls_total.load(Ordering::Relaxed);
let failed_calls = runtime.metrics.llm_calls_failed.load(Ordering::Relaxed);
if total_calls > 0 {
let failure_rate = (failed_calls as f64 / total_calls as f64) * 100.0;
if failure_rate > 10.0 {
eprintln!("WARNING: LLM failure rate is {:.1}%", failure_rate);
}
}
Prometheus Export (Custom)
Export metrics in Prometheus format:
use std::sync::atomic::Ordering;
fn export_prometheus(metrics: &Metrics) -> String {
let o = Ordering::Relaxed;
format!(
"# HELP oneclaw_messages_total Total messages received\n\
# TYPE oneclaw_messages_total counter\n\
oneclaw_messages_total {}\n\
\n\
# HELP oneclaw_llm_calls_total Total LLM calls\n\
# TYPE oneclaw_llm_calls_total counter\n\
oneclaw_llm_calls_total {}\n\
\n\
# HELP oneclaw_llm_tokens_total Total LLM tokens\n\
# TYPE oneclaw_llm_tokens_total counter\n\
oneclaw_llm_tokens_total {}\n",
metrics.messages_total.load(o),
metrics.llm_calls_total.load(o),
metrics.llm_tokens_total.load(o),
)
}
Thread Safety
All metrics use AtomicU64 with Ordering::Relaxed for lock-free increments. This provides:
- Thread-safe updates from any thread
- No mutex contention or blocking
- Minimal performance overhead (~1-2 CPU cycles per increment)
- Eventual consistency (suitable for monitoring, not strict ordering)
Metrics use Ordering::Relaxed because strict ordering is not required for telemetry.
All increments are atomic, but the order in which different threads’ updates become visible is not guaranteed.
Command-Line Access
Users can view metrics via the built-in metrics command:
> metrics
OneClaw Metrics:
Uptime: 2h 15m 30s
Messages:
Total: 145 | Secured: 142 | Denied: 3 | Rate-limited: 0
...
Or view a subset via the status command:
> status
OneClaw Agent v1.5.0
Uptime: 2h 15m 30s
Security: enforced
Memory: 234 entries (sqlite)
...
Messages: 145 total (3 denied)
LLM: 89 calls (avg 245ms)
Type 'health' for detailed check
Type 'metrics' for full telemetry