Overview
BothVoiceAgent and VideoAgent accept comprehensive configuration options for models, speech synthesis, transcription, history management, and more. This guide covers all available configuration fields with their types, defaults, and usage examples.
VoiceAgent Configuration
Required Options
AI SDK language model for chat generation. Use any AI SDK provider model.
Optional Models
AI SDK transcription model for speech-to-text. Required for audio input processing.
AI SDK speech model for text-to-speech generation.
System Configuration
System prompt that defines the agent’s behavior and personality.
Stopping condition for multi-step tool execution loops. Controls when the agent stops calling tools.
AI SDK tools map for function calling. See Tool Integration for details.
Speech Configuration
TTS voice ID. For OpenAI:
alloy, echo, fable, onyx, nova, shimmer.Style instructions passed to the speech model for tone and delivery.
Audio output format:
mp3, opus, aac, wav, pcm, etc.Fine-tune streaming TTS behavior for low-latency audio.StreamingSpeechConfig Fields:
| Field | Type | Default | Description |
|---|---|---|---|
minChunkSize | number | 50 | Minimum characters before generating speech |
maxChunkSize | number | 200 | Maximum characters per chunk (splits at sentence boundary) |
parallelGeneration | boolean | true | Generate TTS for next chunks while current plays |
maxParallelRequests | number | 3 | Maximum concurrent TTS requests |
History Management
Configure conversation history limits to manage memory and context window usage.HistoryConfig Fields:
| Field | Type | Default | Description |
|---|---|---|---|
maxMessages | number | 100 | Max messages in history (0 = unlimited) |
maxTotalChars | number | 0 | Max total characters across all messages (0 = unlimited) |
When limits are exceeded, oldest messages are trimmed in pairs (user + assistant) to preserve conversation turns. The agent emits a
history_trimmed event with details.Size Limits
Maximum audio input size in bytes (default: 10 MB). Rejects larger audio inputs.
WebSocket Configuration
Default WebSocket URL for
connect() method. Optional for text-only usage.VideoAgent Configuration
VideoAgent extends VoiceAgent with video-specific options:
Maximum frame input size in bytes (default: 5 MB).
Maximum number of frames to keep in context buffer for visual conversation history.
Session ID for this video agent instance. Auto-generated if not provided.
Important: VideoAgent requires a vision-enabled model to process video frames:
- OpenAI:
gpt-4o,gpt-4o-mini - Anthropic:
claude-3.5-sonnet,claude-3-opus - Google:
gemini-1.5-pro,gemini-1.5-flash
Complete Example
Runtime Configuration Updates
Updating Tools
You can add or update tools after initialization:VideoAgent Configuration
VideoAgent supports runtime config updates:Environment Variables Pattern
A common pattern is to use environment variables for sensitive configuration:Next Steps
Tool Integration
Learn how to integrate AI SDK tools for function calling
Browser Client
Build a real-time voice interface in the browser