Not an LLM Process
The compactor is not an LLM process. It’s a programmatic monitor that watches a number (context token count) and spawns workers when thresholds are crossed.Tiered Thresholds
The compactor uses three thresholds:Background Compaction (>80%)
Summarize oldest 30% of messages:- Reads the oldest 30% of messages
- Uses an LLM to summarize them
- Extracts any memorable facts, decisions, preferences
- Saves extracted memories to the memory graph
- Replaces original messages with a summary
Aggressive Compaction (>85%)
Summarize oldest 50% of messages:Emergency Truncation (>95%)
Drop oldest messages without LLM summarization:Compaction Flow
Channel completes a turn
After the channel finishes processing a user message, it calls the compactor.
Spawn compaction worker (or emergency truncate)
Emergency truncation is synchronous and fast. Background and aggressive spawn a worker.
Worker summarizes and extracts
The compaction worker:
- Reads old messages
- Summarizes them into a cohesive narrative
- Extracts memories using
memory_savetool - Returns summary
Compaction Worker Prompt
The compaction worker gets a focused system prompt:memory_save— Create typed memories- No other tools (no shell, file, exec)
Token Estimation
Context size is estimated using a simple heuristic:Compaction Lock
Only one compaction runs per channel at a time:Summary Stacking
Multiple compactions stack chronologically:Memory Extraction
During compaction, the worker extracts memories:Context Window Configuration
Per-agent context window:Branch Compaction
Branches inherit large channel histories and can overflow on first call. They have built-in pre-flight compaction:Worker Compaction
Workers run in segments and compact between segments:Compaction Observability
Compaction events are logged:Error Handling
If compaction fails:- Background/aggressive compaction — Logged, compaction lock released, channel continues normally
- Emergency truncation — Always succeeds (synchronous drop)
Best Practices
When to adjust thresholds
When to adjust thresholds
Lower thresholds (compact earlier):
- Small context window models (less than 128k tokens)
- Very active channels with rapid message flow
- You want more aggressive memory extraction
- Large context window models (200k+ tokens)
- Slow-moving conversations
- You want to preserve more raw context
How to tune compaction aggressiveness
How to tune compaction aggressiveness
Conservative (preserve more context):Aggressive (compact earlier):
When to use emergency truncation
When to use emergency truncation
Emergency truncation is a safety valve. It fires when:
- Context is critically full (>95%)
- Background compaction hasn’t completed yet
- The next user message would overflow
- Lower background/aggressive thresholds
- Increase context window
- Check if compaction workers are failing
Compaction vs Memory
Compaction and memory are complementary: Compaction — Manages context window size. Summarizes old messages. Runs automatically. Memory — Extracts structured knowledge. Stores facts, preferences, decisions. Queried by branches. During compaction, both happen:- Messages are summarized (compaction)
- Important facts are extracted as typed memories (memory)
Next Steps
Memory System
Learn how compaction extracts memories
Branches
See how branches handle context overflow
Workers
Understand worker segmentation and compaction
Configuration
Full compaction configuration reference