Performance Benchmarks

Last Updated: 23 February 2026
Environment: MacBook Pro M3, Python 3.13, Supabase (Singapore region)

Boot Sequence Performance

Metric	Measured
Cold Boot (full `/start` sequence)	~1m 45s (1–2 minutes)
Warm Boot (cached, no script re-run)	~30–60 seconds
Identity Hash Verification	~0.3s
Search Index Prime	~1–2s

Boot time includes: loading 3 core identity files, running boot.py (session recall + creation + context capture + semantic prime), and the Athena daemon startup. The ~1m 45s figure is the end-to-end measured time on an M3 MacBook Pro.Sovereign/Bionic Trade-off: We consciously accept a ~1m 45s delay in exchange for system-wide robustness. This duration ensures the total cognitive surface (laws, identity, memory bank) is loaded, structured, and semantically primed before the first user query—preventing amnesia, context drift, and grounding failures that plague “fast-boot” stateless agents.

Optimizations Applied

Persistent Caching: Embeddings cached to disk, delta sync on changed files
Parallel Phase Execution: Boot phases run concurrently where possible
Canonical Memory: Single materialized view replaces querying 1,100+ session logs

Semantic Search Performance

Query Type	Latency (p50)	Latency (p95)	Results Quality
Simple keyword	180ms	320ms	⭐⭐⭐
Semantic concept	420ms	680ms	⭐⭐⭐⭐⭐
Cross-domain fusion	850ms	1,200ms	⭐⭐⭐⭐⭐

Search Pipeline

Query → Embedding (local) → Parallel Search (Supabase + GraphRAG) → RRF Fusion → Rerank → Top 10

RRF (Reciprocal Rank Fusion) combines results from:

Supabase pgvector — Dense vector similarity
GraphRAG Communities — Structural/relational context
Keyword BM25 — Exact match fallback

Token Economics

| Operation | Tokens (Before) | Tokens (After) | Savings | |-----------|-----------------|----------------|---------|| | Cold start context injection | ~50,000 | ~10,000 (core boot) | 80% | | Full enriched boot (with profile) | ~50,000 | ~14,500 | 71% | | Session handoff (/end) | ~8,000 | ~1,500 | 81% | | Protocol retrieval | ~3,000 | ~800 | 73% |

Boot Payload Breakdown (Measured Feb 2026)

The core boot payload is ~10K tokens — always loaded on /start. The full enriched payload (with user profile and on-demand files) is ~14.5K tokens, loaded adaptively. The Canonical Memory alone is ~4.3K tokens — a single materialized view that supersedes searching 1,100+ session logs.

Component	Source File	Est. Tokens	Load Strategy
Core Identity	`Core_Identity.md`	~3,800	Boot (always)
Canonical Memory	`CANONICAL.md`	~4,300	Boot (always)
User Context	`userContext.md`	~590	Boot (always)
Product Context	`productContext.md`	~345	Boot (always)
Active Context	`activeContext.md`	~930	Boot (always)
User Profile	`User_Profile_Core.md`	~4,477	On-Demand
─── Core Boot Total		~9,965
─── Full Enriched Total		~14,442

How We Achieved This

Document Sharding: Large protocols split into retrievable chunks
Summary Caching: Session summaries pre-computed at /end
Selective Context: Only relevant protocols injected per query
Canonical Memory: Single materialized view supersedes searching 1,100+ session logs

Data Volume Stats

Asset	Count	Size
Protocols & Workflows	120+ protocols, 50+ workflows	~1.5 MB
Case Studies	42	~2.4 MB
Session Logs	1,100+	~4.2 MB
GraphRAG Entities	4,200+	~46 MB
Vector Embeddings	12,800+	~78 MB

Reliability Metrics

Metric	Value
Boot Success Rate	99.2%
Search Availability	99.8%
Data Redundancy	3-way (Local + GitHub + Supabase)
Recovery Time (from cloud)	< 5 minutes

Methodology

All benchmarks measured with:

time python3 .agent/scripts/boot.py
time python3 .agent/scripts/smart_search.py "test query"

Latency measurements averaged over 50 runs. Token counts measured via Anthropic/OpenAI token counters.

These numbers are real production metrics from a live system, not synthetic benchmarks.

Help

About

Performance Benchmarks

Boot Sequence Performance

Optimizations Applied

Semantic Search Performance

Search Pipeline

Token Economics

Boot Payload Breakdown (Measured Feb 2026)

How We Achieved This

Data Volume Stats

Reliability Metrics

Methodology

Build docs developers (and LLMs) love

Help

About

​Boot Sequence Performance

​Optimizations Applied

​Semantic Search Performance

​Search Pipeline

​Token Economics

​Boot Payload Breakdown (Measured Feb 2026)

​How We Achieved This

​Data Volume Stats

​Reliability Metrics

​Methodology

Build docs developers (and LLMs) love

Boot Sequence Performance

Optimizations Applied

Semantic Search Performance

Search Pipeline

Token Economics

Boot Payload Breakdown (Measured Feb 2026)

How We Achieved This

Data Volume Stats

Reliability Metrics

Methodology