Skip to main content

SummarizingMemory Example

SummarizingMemory uses a dedicated LLM to compress old conversation history into concise summaries, allowing you to maintain context from long conversations while staying within token limits.

What SummarizingMemory Does

SummarizingMemory:
  • Uses a dedicated (typically cheaper/faster) LLM for summarization
  • Compresses old conversation history into summaries
  • Maintains a “sliding window” of recent messages
  • Automatically triggers summarization when token threshold is reached
  • Preserves essential context while reducing token usage
  • Perfect for very long conversations where all context matters

Complete Working Example

import 'dotenv/config'

import { createAgent } from '@agentlib/core'
import { openai } from '@agentlib/openai'
import { SummarizingMemory } from '@agentlib/memory'
import { createLogger } from '@agentlib/logger'

/**
 * Summarizing Memory Example
 * 
 * Demonstrates how to use a dedicated (usually cheaper/faster) LLM 
 * to compress old conversation history into a concise summary.
 */

async function main() {
    // 1. Setup the dedicated model for summarization
    const summarizerModel = openai({
        apiKey: process.env['OPENAI_API_KEY'] ?? '',
        model: process.env['OPENAI_MODEL_SUMMARIZER'] ?? '',
        baseURL: process.env['OPENAI_BASE_URL'] ?? ''
    })

    // 2. Setup Summarizing Memory
    // We trigger summarization after 200 tokens to see it in action
    const memory = new SummarizingMemory({
        model: summarizerModel,
        activeWindowTokens: 250,
        summaryPrompt: 'Summarize the user profile and preferences accurately.'
    })

    const agent = createAgent({
        name: 'summarizer-agent',
        systemPrompt: 'You are a personalized assistant. Help the user plan a trip.',
    })
        .provider(openai({
            apiKey: process.env['OPENAI_API_KEY'] ?? '',
            model: process.env['OPENAI_MODEL'] ?? '',
            baseURL: process.env['OPENAI_BASE_URL'] ?? ''
        }))
        .memory(memory)
        .use(createLogger({ level: 'info' }))

    const sessionId = 'travel-planner'

    console.log('--- Summarizing Memory Demo (Compression at 250 tokens) ---\n')

    const interaction = [
        "I am planning a trip to Japan next April. I love sushi and nature.",
        "I want to stay for 2 weeks. My budget is around $5000.",
        "I prefer boutique hotels over large chains.",
        "I also want to visit some hidden gems, not just tourist spots.",
        "Tell me what you know about my travel preferences so far."
    ]

    for (const input of interaction) {
        console.log(`> User: ${input}`)
        const res = await agent.run({ input, sessionId })
        console.log(`Agent: ${res.output}\n`)

        // Check if a summary has been generated yet
        const currentSummary = memory.getSummary(sessionId)
        if (currentSummary) {
            console.log('-- CURRENT COMPRESSED SUMMARY --')
            console.log(currentSummary)
            console.log('--------------------------------\n')
        }
    }
}

main().catch(console.error)

Key Configuration

Dedicated Summarizer Model

Use a separate, often cheaper model for summarization:
const summarizerModel = openai({
    apiKey: process.env['OPENAI_API_KEY'] ?? '',
    model: 'gpt-4o-mini', // Cheaper model for summarization
    baseURL: process.env['OPENAI_BASE_URL'] ?? ''
})

Active Window Tokens

Defines when to trigger summarization:
const memory = new SummarizingMemory({
    model: summarizerModel,
    activeWindowTokens: 250,  // Summarize when exceeding this
    summaryPrompt: 'Summarize the user profile and preferences accurately.'
})

Custom Summary Prompt

Control how the summarization is performed:
summaryPrompt: 'Summarize the user profile and preferences accurately.'

How It Works

  1. Recent Messages: Keeps recent messages in the “active window” (up to activeWindowTokens)
  2. Automatic Trigger: When the active window exceeds the token limit, triggers summarization
  3. Compression: Uses the dedicated model to summarize older messages
  4. Summary Storage: Stores the summary and removes the original messages
  5. Context Composition: Provides both summary and recent messages to the agent

Inspecting Summaries

You can access the current summary at any time:
const currentSummary = memory.getSummary(sessionId)
if (currentSummary) {
    console.log('Current summary:', currentSummary)
}

When to Use SummarizingMemory

  • Very long conversations where all context is important
  • Customer support scenarios with extended interaction history
  • Applications that need to remember details from the entire conversation
  • When token costs are a concern for long sessions
  • Scenarios where losing old context would degrade user experience

Cost Optimization

SummarizingMemory helps reduce costs by:
  • Using a cheaper model (e.g., GPT-4o-mini) for summarization
  • Compressing hundreds of tokens into a few dozen
  • Allowing the main agent to use fewer tokens per request
  • Maintaining context without sending entire conversation history

Comparison with Other Strategies

FeatureBufferMemorySlidingWindowMemorySummarizingMemory
Old contextDroppedDroppedSummarized
Token efficiencyLowMediumHigh
Context retentionPoor (long convos)Poor (long convos)Excellent
ComplexitySimpleMediumAdvanced
Additional costNoneNoneSummarization LLM
Best forShort chatsMedium chatsLong conversations

Build docs developers (and LLMs) love