Memory Management - Voice Agent AI SDK

Overview

The ConversationManager maintains conversation history with configurable memory limits. When limits are exceeded, oldest messages are automatically trimmed to prevent unbounded memory growth and token usage.

Why Memory Management?

Token Limits

Long conversations can exceed model context windows (e.g., 128k tokens)

Memory Usage

Unbounded history causes memory leaks in long-running sessions

API Costs

Every message is sent to the LLM on each request (affects cost)

Performance

Smaller history = faster LLM processing

HistoryConfig

interface HistoryConfig {
  /** Maximum number of messages to keep in history. 
   *  When exceeded, oldest messages are trimmed. 
   *  Set to 0 for unlimited. */
  maxMessages: number;
  
  /** Maximum total character count across all messages. 
   *  When exceeded, oldest messages are trimmed. 
   *  Set to 0 for unlimited. */
  maxTotalChars: number;
}

Default Configuration

const DEFAULT_HISTORY_CONFIG = {
  maxMessages: 100,       // Keep last 100 messages
  maxTotalChars: 0,       // Unlimited characters
};

Customizing

import { VoiceAgent } from 'voice-agent-ai-sdk';
import { openai } from '@ai-sdk/openai';

const agent = new VoiceAgent({
  model: openai('gpt-4o'),
  
  history: {
    maxMessages: 50,        // Keep last 50 messages (25 turns)
    maxTotalChars: 50000,   // Or trim when total exceeds 50k chars
  },
});

Trimming Behavior

Message Count Trimming

When maxMessages is exceeded, oldest messages are removed in pairs to preserve user/assistant turns:

if (maxMessages > 0 && this.conversationHistory.length > maxMessages) {
  const excess = this.conversationHistory.length - maxMessages;
  // Round up to even number to preserve turn pairs
  const toRemove = excess % 2 === 0 ? excess : excess + 1;
  this.conversationHistory.splice(0, toRemove);
  
  this.emit('history_trimmed', {
    removedCount: toRemove,
    reason: 'max_messages',
  });
}

Example:

// Configuration
history: { maxMessages: 10 }

// Current history (12 messages)
[
  { role: 'user', content: 'Message 1' },
  { role: 'assistant', content: 'Response 1' },
  { role: 'user', content: 'Message 2' },
  { role: 'assistant', content: 'Response 2' },
  // ... 8 more messages ...
]

// After adding 13th message:
// → 2 oldest messages removed (excess = 3, rounded to 4 for pairs)
// → 'history_trimmed' event: { removedCount: 4, reason: 'max_messages' }

Character Count Trimming

When maxTotalChars is exceeded, messages are removed one at a time from the oldest until total is under the limit:

if (maxTotalChars > 0) {
  let totalChars = this.conversationHistory.reduce((sum, msg) => {
    const content = typeof msg.content === 'string'
      ? msg.content
      : JSON.stringify(msg.content);
    return sum + content.length;
  }, 0);

  let removedCount = 0;
  while (totalChars > maxTotalChars && this.conversationHistory.length > 2) {
    const removed = this.conversationHistory.shift();
    if (removed) {
      const content = typeof removed.content === 'string'
        ? removed.content
        : JSON.stringify(removed.content);
      totalChars -= content.length;
      removedCount++;
    }
  }
  
  if (removedCount > 0) {
    this.emit('history_trimmed', {
      removedCount,
      reason: 'max_total_chars',
    });
  }
}

Example:

// Configuration
history: { maxTotalChars: 1000 }

// Current history (1200 chars total)
[
  { role: 'user', content: '200 chars...' },      // ← Oldest
  { role: 'assistant', content: '300 chars...' }, // ← Second oldest
  { role: 'user', content: '400 chars...' },
  { role: 'assistant', content: '300 chars...' },
]

// After adding new message (150 chars):
// Total = 1350 chars (exceeds limit)
// → Remove oldest message (200 chars) → Total = 1150
// → Remove second oldest (300 chars) → Total = 850 ✓
// → 'history_trimmed' event: { removedCount: 2, reason: 'max_total_chars' }

Minimum History Retention

Character-based trimming always keeps at least 2 messages (one user/assistant pair):

while (totalChars > maxTotalChars && this.conversationHistory.length > 2) {
  // Remove oldest message
}

This ensures the model always has some context, even if a single message exceeds the limit.

Unlimited History

Set limits to 0 to disable trimming:

history: {
  maxMessages: 0,      // Unlimited messages
  maxTotalChars: 0,    // Unlimited characters
}

Unlimited history can cause:

Memory leaks in long-running sessions
Token limit errors when history exceeds model context window
High API costs as every message is sent on each request

Use unlimited history only for:

Short-lived sessions (< 10 minutes)
Testing and development
Sessions with explicit manual cleanup

Events

history_trimmed

object

Conversation history was automatically trimmed

{
  removedCount: number;  // Number of messages removed
  reason: 'max_messages' | 'max_total_chars';
}

history_cleared

void

Conversation history was manually cleared via clearHistory()

Listening to Trim Events

import { VoiceAgent } from 'voice-agent-ai-sdk';

const agent = new VoiceAgent({
  model: openai('gpt-4o'),
  history: { maxMessages: 20 },
});

agent.on('history_trimmed', ({ removedCount, reason }) => {
  console.log(`Trimmed ${removedCount} messages (reason: ${reason})`);
  
  // Optional: Log to analytics, notify user, etc.
});

agent.on('history_cleared', () => {
  console.log('History cleared manually');
});

Manual History Management

Clear History

Remove all messages:

agent.clearHistory();
// Emits 'history_cleared' event

Get History

Retrieve current conversation:

import type { ModelMessage } from 'ai';

const history: ModelMessage[] = agent.getHistory();
console.log(`${history.length} messages in history`);

history.forEach((msg) => {
  console.log(`${msg.role}: ${msg.content}`);
});

Set History

Restore conversation from saved state:

import type { ModelMessage } from 'ai';

// Save history to database/file
const savedHistory: ModelMessage[] = agent.getHistory();
await db.save(userId, savedHistory);

// Later: restore history
const restoredHistory = await db.load(userId);
agent.setHistory(restoredHistory);

Get History Length

const messageCount = agent.getHistory().length;
console.log(`${messageCount} messages in history`);

Content Type Handling

The character count includes:

String content: Counted directly
Multimodal content: JSON-stringified for counting

// String content
{ role: 'user', content: 'Hello!' }  // 6 chars

// Multimodal content
{
  role: 'user',
  content: [
    { type: 'text', text: 'Describe this image' },
    { type: 'image', image: 'base64EncodedData...' },
  ]
}
// JSON.stringify(content).length counted

Image data is counted in character limits. Use maxMessages instead of maxTotalChars for vision-enabled agents to avoid unpredictable trimming.

Example: Session-Based History

import { VoiceAgent } from 'voice-agent-ai-sdk';
import { openai } from '@ai-sdk/openai';
import Redis from 'ioredis';

const redis = new Redis();

interface Session {
  userId: string;
  agent: VoiceAgent;
}

const sessions = new Map<string, Session>();

wss.on('connection', async (socket, req) => {
  const userId = req.headers['user-id'] as string;
  
  // Load saved history
  const savedHistory = await redis.get(`history:${userId}`);
  
  const agent = new VoiceAgent({
    model: openai('gpt-4o'),
    history: {
      maxMessages: 50,       // Keep last 50 messages
      maxTotalChars: 100000, // Or 100k chars
    },
  });
  
  // Restore history
  if (savedHistory) {
    agent.setHistory(JSON.parse(savedHistory));
    console.log(`Restored ${agent.getHistory().length} messages for ${userId}`);
  }
  
  agent.handleSocket(socket);
  sessions.set(userId, { userId, agent });
  
  // Save history periodically
  const saveInterval = setInterval(async () => {
    const history = agent.getHistory();
    await redis.set(`history:${userId}`, JSON.stringify(history));
  }, 30000); // Every 30 seconds
  
  agent.on('disconnected', async () => {
    clearInterval(saveInterval);
    
    // Final save
    const history = agent.getHistory();
    await redis.set(`history:${userId}`, JSON.stringify(history));
    
    agent.destroy();
    sessions.delete(userId);
  });
});

Recommended Configurations

Short Sessions (5-10 minutes)

history: {
  maxMessages: 30,   // ~15 conversation turns
  maxTotalChars: 0,  // Unlimited (trimming by count sufficient)
}

Medium Sessions (30-60 minutes)

history: {
  maxMessages: 100,     // ~50 turns
  maxTotalChars: 50000, // ~50k chars
}

Long Sessions (hours)

history: {
  maxMessages: 200,      // ~100 turns
  maxTotalChars: 100000, // ~100k chars
}

Vision Agents (VideoAgent)

history: {
  maxMessages: 20,   // Images inflate char count
  maxTotalChars: 0,  // Use message count only
}

Cost-Optimized

history: {
  maxMessages: 20,      // Fewer messages = lower API cost
  maxTotalChars: 10000, // Strict char limit
}

Token vs. Character Count

The SDK uses character count, not token count, because:

Simplicity: No tokenizer dependency
Predictability: Same for all models
Performance: Faster than tokenization

As a rough approximation:

GPT models: ~4 characters = 1 token
Claude models: ~3.5 characters = 1 token

So maxTotalChars: 50000 ≈ 12,500-14,000 tokens.

For precise token counting, use the model’s tokenizer externally:

import { encodingForModel } from 'js-tiktoken';

const encoding = encodingForModel('gpt-4');
const tokens = encoding.encode('Your text here');
console.log(`${tokens.length} tokens`);

Best Practices

Start conservative

Begin with lower limits and increase if needed:

history: {
  maxMessages: 50,
  maxTotalChars: 30000,
}

Monitor trim events

Log history_trimmed to understand actual usage:

agent.on('history_trimmed', ({ removedCount, reason }) => {
  analytics.track('history_trimmed', { removedCount, reason });
});

Save history for long sessions

Persist history to database for session restoration:

// On disconnect
const history = agent.getHistory();
await db.saveHistory(userId, history);

Use maxMessages for vision agents

Image data inflates character counts unpredictably:

// VideoAgent config
history: {
  maxMessages: 20,   // Use message count
  maxTotalChars: 0,  // Disable char limit
}

Clear history on topic change

Allow users to start fresh:

// User says "let's talk about something else"
agent.clearHistory();

Limitations

System messages are not affected by trimming. The instructions (system prompt) is always included separately and doesn’t count toward limits.

Trimming is irreversible. Once messages are removed, they cannot be recovered unless saved externally.

Next Steps

VoiceAgent

Learn about the voice agent architecture

Streaming Speech

Understand speech chunking and generation

API Reference

Full VoiceAgent API documentation

Quick Start

Build your first voice agent

Get Started

Core Concepts

Guides

Examples

​Overview

​Why Memory Management?

Token Limits

Memory Usage

API Costs

Performance

​HistoryConfig

​Default Configuration

​Customizing

​Trimming Behavior

​Message Count Trimming

​Character Count Trimming

​Minimum History Retention

​Unlimited History

​Events

​Listening to Trim Events

​Manual History Management

​Clear History

​Get History

​Set History

​Get History Length

​Content Type Handling

​Example: Session-Based History

​Recommended Configurations

​Short Sessions (5-10 minutes)

​Medium Sessions (30-60 minutes)

​Long Sessions (hours)

​Vision Agents (VideoAgent)

​Cost-Optimized

​Token vs. Character Count

​Best Practices

​Limitations

​Next Steps

VoiceAgent

Streaming Speech

API Reference

Quick Start

Build docs developers (and LLMs) love

Overview

Why Memory Management?

HistoryConfig

Default Configuration

Customizing

Trimming Behavior

Message Count Trimming

Character Count Trimming

Minimum History Retention

Unlimited History

Events

Listening to Trim Events

Manual History Management

Clear History

Get History

Set History

Get History Length

Content Type Handling

Example: Session-Based History

Recommended Configurations

Short Sessions (5-10 minutes)

Medium Sessions (30-60 minutes)

Long Sessions (hours)

Vision Agents (VideoAgent)

Cost-Optimized

Token vs. Character Count

Best Practices

Limitations

Next Steps