Overview
RCLI provides comprehensive benchmarking tools to measure:- STT: Transcription latency and word error rate (WER)
- LLM: Token generation speed, TTFT, context usage
- TTS: Synthesis time and real-time factor
- E2E: End-to-end pipeline latency
- RAG: Embedding, retrieval, and query latency
- Memory: RAM usage across subsystems
Simple Benchmark
rcli_benchmark
Run N iterations of the full pipeline on a test WAV file.Engine handle (must be initialized)
Path to test WAV file (16kHz mono recommended)
Number of benchmark runs (3-10 recommended for stable averages)
Callback for progress and results. Can be
NULL to skip callbacks.Events fired:"benchmark_progress": Progress update (e.g.,"3/10")"benchmark_run": Single run result (JSON)"benchmark_result": Aggregate results (JSON)
User data passed to callback
0: Benchmark completed successfully- Non-zero: Failed
Example
Output Example
Comprehensive Benchmark Suite
rcli_run_full_benchmark
Run comprehensive benchmarks across all subsystems.Engine handle (must be initialized)
Benchmark suite to run:
"all": Run all benchmarks"stt": STT latency + WER accuracy"llm": LLM generation + tool calling"tts": TTS synthesis + RTF"e2e": End-to-end pipeline"tools"or"actions": Action info"rag": RAG retrieval + query"memory": RAM usage
"stt,llm,tts"Number of measured runs per test (3 is typical)
Optional path to save JSON results. Pass
NULL to skip export.0: Success- Non-zero: Failed
Example: Full Benchmark
Example: Selective Benchmarks
Benchmark Categories
STT Benchmark
Measures:- Latency: Time to transcribe audio
- WER: Word error rate across sample utterances
- Short commands (“Open Safari”)
- Questions (“What’s the weather?”)
- Long commands (multi-sentence)
- Factual queries
- Multi-action commands
LLM Benchmark
Measures:- TTFT: Time to first token (prompt processing)
- Token/s: Generation throughput
- Context usage: Prompt tokens vs. context window
- Tool calling: Accuracy and latency
TTS Benchmark
Measures:- Synthesis time: Time to generate audio
- RTF: Real-time factor (< 1.0 is faster than real-time)
- Samples generated: Output audio length
E2E Pipeline Benchmark
Measures:- E2E latency: Speech input → first audio output
- Total latency: Complete pipeline (STT → LLM → TTS)
- Long-form: Multi-sentence responses
RAG Benchmark
Measures:- Embedding latency: Query → vector
- Retrieval latency: Vector + BM25 search
- Full RAG query: Embedding + retrieval + LLM
RAG benchmark only runs if an index is loaded via
rcli_rag_load_index().Memory Benchmark
Measures:- LLM: Model + KV cache
- Embedding: RAG embedding model
- STT: Zipformer + Whisper
- TTS: Piper/Kokoro
- Total: Peak RAM usage
JSON Export Format
Complete Example: Benchmark Runner
Performance Targets (M1/M2/M3)
| Metric | Target | Good | Excellent |
|---|---|---|---|
| STT latency | < 300ms | < 200ms | < 150ms |
| LLM TTFT | < 150ms | < 100ms | < 80ms |
| LLM tok/s | > 30 | > 40 | > 50 |
| TTS RTF | < 1.0 | < 0.5 | < 0.3 |
| E2E latency | < 800ms | < 600ms | < 500ms |
| RAG retrieval | < 10ms | < 5ms | < 3ms |
See Also
- State Management - Query performance metrics
- RAG - RAG system details
- RCLI CLI:
rcli benchfor interactive benchmarks