SummarizingMemory Example
SummarizingMemory uses a dedicated LLM to compress old conversation history into concise summaries, allowing you to maintain context from long conversations while staying within token limits.What SummarizingMemory Does
SummarizingMemory:- Uses a dedicated (typically cheaper/faster) LLM for summarization
- Compresses old conversation history into summaries
- Maintains a “sliding window” of recent messages
- Automatically triggers summarization when token threshold is reached
- Preserves essential context while reducing token usage
- Perfect for very long conversations where all context matters
Complete Working Example
Key Configuration
Dedicated Summarizer Model
Use a separate, often cheaper model for summarization:Active Window Tokens
Defines when to trigger summarization:Custom Summary Prompt
Control how the summarization is performed:How It Works
- Recent Messages: Keeps recent messages in the “active window” (up to
activeWindowTokens) - Automatic Trigger: When the active window exceeds the token limit, triggers summarization
- Compression: Uses the dedicated model to summarize older messages
- Summary Storage: Stores the summary and removes the original messages
- Context Composition: Provides both summary and recent messages to the agent
Inspecting Summaries
You can access the current summary at any time:When to Use SummarizingMemory
- Very long conversations where all context is important
- Customer support scenarios with extended interaction history
- Applications that need to remember details from the entire conversation
- When token costs are a concern for long sessions
- Scenarios where losing old context would degrade user experience
Cost Optimization
SummarizingMemory helps reduce costs by:- Using a cheaper model (e.g., GPT-4o-mini) for summarization
- Compressing hundreds of tokens into a few dozen
- Allowing the main agent to use fewer tokens per request
- Maintaining context without sending entire conversation history
Comparison with Other Strategies
| Feature | BufferMemory | SlidingWindowMemory | SummarizingMemory |
|---|---|---|---|
| Old context | Dropped | Dropped | Summarized |
| Token efficiency | Low | Medium | High |
| Context retention | Poor (long convos) | Poor (long convos) | Excellent |
| Complexity | Simple | Medium | Advanced |
| Additional cost | None | None | Summarization LLM |
| Best for | Short chats | Medium chats | Long conversations |