Overview
SummarizingMemory compresses old conversation context using a language model. When stored history exceeds activeWindowTokens, older messages are summarized into a single system-style message.
Best for: long-running conversations that need to maintain context without exceeding token limits.
Constructor
import { SummarizingMemory } from '@agentlib/memory'
import { openai } from '@agentlib/openai'
const memory = new SummarizingMemory(config)
Configuration
The model provider used to generate summaries.
Should be a fast/cheap model (e.g. gpt-4o-mini).
Token budget for the active (non-summarized) window.
When exceeded, the oldest messages are compressed into a summary.
Maximum tokens to allow for the compressed summary itself.
summaryPrompt
string
default:"You are a memory compression assistant..."
Custom prompt used to generate summaries.
Methods
read()
Retrieve conversation history with summary injected.
async read(options: MemoryReadOptions): Promise<ModelMessage[]>
Parameters:
options.sessionId - Session identifier (defaults to 'default')
Returns: Array of messages, with summary (if exists) as a system message followed by active messages.
Behavior:
- Injects summary as a system message:
[Conversation summary so far]\n{summary}
- Appends recent active window messages
- Updates access timestamp
write()
Persist messages and compress if needed.
async write(messages: ModelMessage[], options: MemoryWriteOptions): Promise<void>
Parameters:
messages - Array of messages to store
options.sessionId - Session identifier (defaults to 'default')
options.agentName - Name of the agent storing the messages
options.tags - Metadata tags
Behavior:
- Stores non-system messages in active window
- Checks if active window exceeds
activeWindowTokens
- Triggers compression if budget exceeded
- Updates access timestamp
clear()
Remove stored memory.
async clear(sessionId?: string): Promise<void>
Parameters:
sessionId - If provided, clears only that session. Otherwise clears all sessions.
entries()
Retrieve raw memory entries for inspection/debugging.
async entries(sessionId?: string): Promise<MemoryEntry[]>
Parameters:
sessionId - If provided, returns only that session’s entry
Returns: Array of MemoryEntry objects with metadata.
getSummary()
Get the current summary for a session (for debugging/inspection).
getSummary(sessionId: string): string | null
Parameters:
sessionId - Session to inspect
Returns: The summary text, or null if no summary exists.
Usage Examples
Basic Usage
import { Agent } from '@agentlib/core'
import { SummarizingMemory } from '@agentlib/memory'
import { openai } from '@agentlib/openai'
const memory = new SummarizingMemory({
model: openai({ apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini' }),
activeWindowTokens: 3000,
})
const agent = new Agent({
name: 'assistant',
memory,
})
Custom Summary Prompt
const memory = new SummarizingMemory({
model: openai({ model: 'gpt-4o-mini' }),
summaryPrompt: `You are a technical assistant.
Summarize the conversation focusing on:
- Code changes made
- Technical decisions
- Outstanding issues
Be extremely concise.`,
})
Inspecting Summaries
const memory = new SummarizingMemory({
model: openai({ model: 'gpt-4o-mini' }),
})
// After a long conversation
const summary = memory.getSummary('user-123')
if (summary) {
console.log('Current summary:', summary)
}
Production Configuration
const memory = new SummarizingMemory({
model: openai({
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o-mini', // Use cheap model for summaries
}),
activeWindowTokens: 4000, // Keep recent context
summaryMaxTokens: 800, // Allow detailed summaries
})
How Compression Works
When the active window exceeds activeWindowTokens:
- Split: Messages are split in half
- Compress: First half is sent to the model for summarization
- Keep: Second half remains as active messages
- Merge: If a summary already exists, it’s included in the compression prompt
- Store: New summary replaces the old one, active window is trimmed
Example flow:
// Before compression (8000 tokens)
Active: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8]
// After compression (4000 tokens)
Summary: "User asked about X, agent explained Y..."
Active: [msg5, msg6, msg7, msg8]
// Next compression merges summaries
Summary: "Previous: User asked about X... New: Discussed Z..."
Active: [msg7, msg8, msg9, msg10]
Default Summary Prompt
You are a memory compression assistant.
Summarize the following conversation concisely, preserving key facts, decisions, and context.
Output only the summary — no preamble or commentary.
Type Reference
Source: /packages/memory/src/summarizing.ts:12-36
interface SummarizingMemoryConfig {
model: ModelProvider
activeWindowTokens?: number // default: 3000
summaryMaxTokens?: number // default: 600
summaryPrompt?: string
}