Overview
The ConversationManager maintains conversation history with configurable memory limits. When limits are exceeded, oldest messages are automatically trimmed to prevent unbounded memory growth and token usage.
Why Memory Management?
Token Limits Long conversations can exceed model context windows (e.g., 128k tokens)
Memory Usage Unbounded history causes memory leaks in long-running sessions
API Costs Every message is sent to the LLM on each request (affects cost)
Performance Smaller history = faster LLM processing
HistoryConfig
interface HistoryConfig {
/** Maximum number of messages to keep in history.
* When exceeded, oldest messages are trimmed.
* Set to 0 for unlimited. */
maxMessages : number ;
/** Maximum total character count across all messages.
* When exceeded, oldest messages are trimmed.
* Set to 0 for unlimited. */
maxTotalChars : number ;
}
Default Configuration
const DEFAULT_HISTORY_CONFIG = {
maxMessages: 100 , // Keep last 100 messages
maxTotalChars: 0 , // Unlimited characters
};
Customizing
import { VoiceAgent } from 'voice-agent-ai-sdk' ;
import { openai } from '@ai-sdk/openai' ;
const agent = new VoiceAgent ({
model: openai ( 'gpt-4o' ),
history: {
maxMessages: 50 , // Keep last 50 messages (25 turns)
maxTotalChars: 50000 , // Or trim when total exceeds 50k chars
},
});
Trimming Behavior
Message Count Trimming
When maxMessages is exceeded, oldest messages are removed in pairs to preserve user/assistant turns:
if ( maxMessages > 0 && this . conversationHistory . length > maxMessages ) {
const excess = this . conversationHistory . length - maxMessages ;
// Round up to even number to preserve turn pairs
const toRemove = excess % 2 === 0 ? excess : excess + 1 ;
this . conversationHistory . splice ( 0 , toRemove );
this . emit ( 'history_trimmed' , {
removedCount: toRemove ,
reason: 'max_messages' ,
});
}
Example:
// Configuration
history : { maxMessages : 10 }
// Current history (12 messages)
[
{ role: 'user' , content: 'Message 1' },
{ role: 'assistant' , content: 'Response 1' },
{ role: 'user' , content: 'Message 2' },
{ role: 'assistant' , content: 'Response 2' },
// ... 8 more messages ...
]
// After adding 13th message:
// → 2 oldest messages removed (excess = 3, rounded to 4 for pairs)
// → 'history_trimmed' event: { removedCount: 4, reason: 'max_messages' }
Character Count Trimming
When maxTotalChars is exceeded, messages are removed one at a time from the oldest until total is under the limit:
if ( maxTotalChars > 0 ) {
let totalChars = this . conversationHistory . reduce (( sum , msg ) => {
const content = typeof msg . content === 'string'
? msg . content
: JSON . stringify ( msg . content );
return sum + content . length ;
}, 0 );
let removedCount = 0 ;
while ( totalChars > maxTotalChars && this . conversationHistory . length > 2 ) {
const removed = this . conversationHistory . shift ();
if ( removed ) {
const content = typeof removed . content === 'string'
? removed . content
: JSON . stringify ( removed . content );
totalChars -= content . length ;
removedCount ++ ;
}
}
if ( removedCount > 0 ) {
this . emit ( 'history_trimmed' , {
removedCount ,
reason: 'max_total_chars' ,
});
}
}
Example:
// Configuration
history : { maxTotalChars : 1000 }
// Current history (1200 chars total)
[
{ role: 'user' , content: '200 chars...' }, // ← Oldest
{ role: 'assistant' , content: '300 chars...' }, // ← Second oldest
{ role: 'user' , content: '400 chars...' },
{ role: 'assistant' , content: '300 chars...' },
]
// After adding new message (150 chars):
// Total = 1350 chars (exceeds limit)
// → Remove oldest message (200 chars) → Total = 1150
// → Remove second oldest (300 chars) → Total = 850 ✓
// → 'history_trimmed' event: { removedCount: 2, reason: 'max_total_chars' }
Minimum History Retention
Character-based trimming always keeps at least 2 messages (one user/assistant pair):
while ( totalChars > maxTotalChars && this . conversationHistory . length > 2 ) {
// Remove oldest message
}
This ensures the model always has some context, even if a single message exceeds the limit.
Unlimited History
Set limits to 0 to disable trimming:
history : {
maxMessages : 0 , // Unlimited messages
maxTotalChars : 0 , // Unlimited characters
}
Unlimited history can cause:
Memory leaks in long-running sessions
Token limit errors when history exceeds model context window
High API costs as every message is sent on each request
Use unlimited history only for:
Short-lived sessions (< 10 minutes)
Testing and development
Sessions with explicit manual cleanup
Events
Conversation history was automatically trimmed {
removedCount : number ; // Number of messages removed
reason : 'max_messages' | 'max_total_chars' ;
}
Conversation history was manually cleared via clearHistory()
Listening to Trim Events
import { VoiceAgent } from 'voice-agent-ai-sdk' ;
const agent = new VoiceAgent ({
model: openai ( 'gpt-4o' ),
history: { maxMessages: 20 },
});
agent . on ( 'history_trimmed' , ({ removedCount , reason }) => {
console . log ( `Trimmed ${ removedCount } messages (reason: ${ reason } )` );
// Optional: Log to analytics, notify user, etc.
});
agent . on ( 'history_cleared' , () => {
console . log ( 'History cleared manually' );
});
Manual History Management
Clear History
Remove all messages:
agent . clearHistory ();
// Emits 'history_cleared' event
Get History
Retrieve current conversation:
import type { ModelMessage } from 'ai' ;
const history : ModelMessage [] = agent . getHistory ();
console . log ( ` ${ history . length } messages in history` );
history . forEach (( msg ) => {
console . log ( ` ${ msg . role } : ${ msg . content } ` );
});
Set History
Restore conversation from saved state:
import type { ModelMessage } from 'ai' ;
// Save history to database/file
const savedHistory : ModelMessage [] = agent . getHistory ();
await db . save ( userId , savedHistory );
// Later: restore history
const restoredHistory = await db . load ( userId );
agent . setHistory ( restoredHistory );
Get History Length
const messageCount = agent . getHistory (). length ;
console . log ( ` ${ messageCount } messages in history` );
Content Type Handling
The character count includes:
String content : Counted directly
Multimodal content : JSON-stringified for counting
// String content
{ role : 'user' , content : 'Hello!' } // 6 chars
// Multimodal content
{
role : 'user' ,
content : [
{ type: 'text' , text: 'Describe this image' },
{ type: 'image' , image: 'base64EncodedData...' },
]
}
// JSON.stringify(content).length counted
Image data is counted in character limits. Use maxMessages instead of maxTotalChars for vision-enabled agents to avoid unpredictable trimming.
Example: Session-Based History
import { VoiceAgent } from 'voice-agent-ai-sdk' ;
import { openai } from '@ai-sdk/openai' ;
import Redis from 'ioredis' ;
const redis = new Redis ();
interface Session {
userId : string ;
agent : VoiceAgent ;
}
const sessions = new Map < string , Session >();
wss . on ( 'connection' , async ( socket , req ) => {
const userId = req . headers [ 'user-id' ] as string ;
// Load saved history
const savedHistory = await redis . get ( `history: ${ userId } ` );
const agent = new VoiceAgent ({
model: openai ( 'gpt-4o' ),
history: {
maxMessages: 50 , // Keep last 50 messages
maxTotalChars: 100000 , // Or 100k chars
},
});
// Restore history
if ( savedHistory ) {
agent . setHistory ( JSON . parse ( savedHistory ));
console . log ( `Restored ${ agent . getHistory (). length } messages for ${ userId } ` );
}
agent . handleSocket ( socket );
sessions . set ( userId , { userId , agent });
// Save history periodically
const saveInterval = setInterval ( async () => {
const history = agent . getHistory ();
await redis . set ( `history: ${ userId } ` , JSON . stringify ( history ));
}, 30000 ); // Every 30 seconds
agent . on ( 'disconnected' , async () => {
clearInterval ( saveInterval );
// Final save
const history = agent . getHistory ();
await redis . set ( `history: ${ userId } ` , JSON . stringify ( history ));
agent . destroy ();
sessions . delete ( userId );
});
});
Recommended Configurations
Short Sessions (5-10 minutes)
history : {
maxMessages : 30 , // ~15 conversation turns
maxTotalChars : 0 , // Unlimited (trimming by count sufficient)
}
Medium Sessions (30-60 minutes)
history : {
maxMessages : 100 , // ~50 turns
maxTotalChars : 50000 , // ~50k chars
}
Long Sessions (hours)
history : {
maxMessages : 200 , // ~100 turns
maxTotalChars : 100000 , // ~100k chars
}
Vision Agents (VideoAgent)
history : {
maxMessages : 20 , // Images inflate char count
maxTotalChars : 0 , // Use message count only
}
Cost-Optimized
history : {
maxMessages : 20 , // Fewer messages = lower API cost
maxTotalChars : 10000 , // Strict char limit
}
Token vs. Character Count
The SDK uses character count , not token count, because:
Simplicity : No tokenizer dependency
Predictability : Same for all models
Performance : Faster than tokenization
As a rough approximation:
GPT models : ~4 characters = 1 token
Claude models : ~3.5 characters = 1 token
So maxTotalChars: 50000 ≈ 12,500-14,000 tokens.
For precise token counting, use the model’s tokenizer externally:
import { encodingForModel } from 'js-tiktoken' ;
const encoding = encodingForModel ( 'gpt-4' );
const tokens = encoding . encode ( 'Your text here' );
console . log ( ` ${ tokens . length } tokens` );
Best Practices
Begin with lower limits and increase if needed: history : {
maxMessages : 50 ,
maxTotalChars : 30000 ,
}
Log history_trimmed to understand actual usage: agent . on ( 'history_trimmed' , ({ removedCount , reason }) => {
analytics . track ( 'history_trimmed' , { removedCount , reason });
});
Save history for long sessions
Persist history to database for session restoration: // On disconnect
const history = agent . getHistory ();
await db . saveHistory ( userId , history );
Use maxMessages for vision agents
Image data inflates character counts unpredictably: // VideoAgent config
history : {
maxMessages : 20 , // Use message count
maxTotalChars : 0 , // Disable char limit
}
Clear history on topic change
Allow users to start fresh: // User says "let's talk about something else"
agent . clearHistory ();
Limitations
System messages are not affected by trimming. The instructions (system prompt) is always included separately and doesn’t count toward limits.
Trimming is irreversible. Once messages are removed, they cannot be recovered unless saved externally.
Next Steps
VoiceAgent Learn about the voice agent architecture
Streaming Speech Understand speech chunking and generation
API Reference Full VoiceAgent API documentation
Quick Start Build your first voice agent