Skip to main content

Overview

SummarizingMemory compresses old conversation context using a language model. When stored history exceeds activeWindowTokens, older messages are summarized into a single system-style message. Best for: long-running conversations that need to maintain context without exceeding token limits.

Constructor

import { SummarizingMemory } from '@agentlib/memory'
import { openai } from '@agentlib/openai'

const memory = new SummarizingMemory(config)

Configuration

model
ModelProvider
required
The model provider used to generate summaries. Should be a fast/cheap model (e.g. gpt-4o-mini).
activeWindowTokens
number
default:3000
Token budget for the active (non-summarized) window. When exceeded, the oldest messages are compressed into a summary.
summaryMaxTokens
number
default:600
Maximum tokens to allow for the compressed summary itself.
summaryPrompt
string
default:"You are a memory compression assistant..."
Custom prompt used to generate summaries.

Methods

read()

Retrieve conversation history with summary injected.
async read(options: MemoryReadOptions): Promise<ModelMessage[]>
Parameters:
  • options.sessionId - Session identifier (defaults to 'default')
Returns: Array of messages, with summary (if exists) as a system message followed by active messages. Behavior:
  • Injects summary as a system message: [Conversation summary so far]\n{summary}
  • Appends recent active window messages
  • Updates access timestamp

write()

Persist messages and compress if needed.
async write(messages: ModelMessage[], options: MemoryWriteOptions): Promise<void>
Parameters:
  • messages - Array of messages to store
  • options.sessionId - Session identifier (defaults to 'default')
  • options.agentName - Name of the agent storing the messages
  • options.tags - Metadata tags
Behavior:
  • Stores non-system messages in active window
  • Checks if active window exceeds activeWindowTokens
  • Triggers compression if budget exceeded
  • Updates access timestamp

clear()

Remove stored memory.
async clear(sessionId?: string): Promise<void>
Parameters:
  • sessionId - If provided, clears only that session. Otherwise clears all sessions.

entries()

Retrieve raw memory entries for inspection/debugging.
async entries(sessionId?: string): Promise<MemoryEntry[]>
Parameters:
  • sessionId - If provided, returns only that session’s entry
Returns: Array of MemoryEntry objects with metadata.

getSummary()

Get the current summary for a session (for debugging/inspection).
getSummary(sessionId: string): string | null
Parameters:
  • sessionId - Session to inspect
Returns: The summary text, or null if no summary exists.

Usage Examples

Basic Usage

import { Agent } from '@agentlib/core'
import { SummarizingMemory } from '@agentlib/memory'
import { openai } from '@agentlib/openai'

const memory = new SummarizingMemory({
  model: openai({ apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini' }),
  activeWindowTokens: 3000,
})

const agent = new Agent({
  name: 'assistant',
  memory,
})

Custom Summary Prompt

const memory = new SummarizingMemory({
  model: openai({ model: 'gpt-4o-mini' }),
  summaryPrompt: `You are a technical assistant.
Summarize the conversation focusing on:
- Code changes made
- Technical decisions
- Outstanding issues
Be extremely concise.`,
})

Inspecting Summaries

const memory = new SummarizingMemory({
  model: openai({ model: 'gpt-4o-mini' }),
})

// After a long conversation
const summary = memory.getSummary('user-123')
if (summary) {
  console.log('Current summary:', summary)
}

Production Configuration

const memory = new SummarizingMemory({
  model: openai({
    apiKey: process.env.OPENAI_API_KEY,
    model: 'gpt-4o-mini', // Use cheap model for summaries
  }),
  activeWindowTokens: 4000,  // Keep recent context
  summaryMaxTokens: 800,     // Allow detailed summaries
})

How Compression Works

When the active window exceeds activeWindowTokens:
  1. Split: Messages are split in half
  2. Compress: First half is sent to the model for summarization
  3. Keep: Second half remains as active messages
  4. Merge: If a summary already exists, it’s included in the compression prompt
  5. Store: New summary replaces the old one, active window is trimmed
Example flow:
// Before compression (8000 tokens)
Active: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8]

// After compression (4000 tokens)
Summary: "User asked about X, agent explained Y..."
Active: [msg5, msg6, msg7, msg8]

// Next compression merges summaries
Summary: "Previous: User asked about X... New: Discussed Z..."
Active: [msg7, msg8, msg9, msg10]

Default Summary Prompt

You are a memory compression assistant.
Summarize the following conversation concisely, preserving key facts, decisions, and context.
Output only the summary — no preamble or commentary.

Type Reference

Source: /packages/memory/src/summarizing.ts:12-36
interface SummarizingMemoryConfig {
  model: ModelProvider
  activeWindowTokens?: number  // default: 3000
  summaryMaxTokens?: number    // default: 600
  summaryPrompt?: string
}

Build docs developers (and LLMs) love