What You’ll Build
By the end of this guide, you’ll have a working voice agent that:- Processes text input and streams responses
- Calls tools (like weather lookup) during conversations
- Generates streaming audio responses
- Supports WebSocket connections for real-time voice interaction
Install the SDK
Install The SDK is built on the Vercel AI SDK, giving you access to multiple LLM providers, tools, and streaming capabilities.
voice-agent-ai-sdk and the AI SDK with your preferred provider:Set up environment variables
Create a The
.env file in your project root with your API keys:VOICE_WS_ENDPOINT is only needed if you want real-time voice interaction over WebSocket. For text-only usage, you can skip it.Define tools for your agent
Tools allow your agent to fetch real-time data or perform actions. Define them using the AI SDK’s These tools will be automatically called when the LLM determines they’re needed to answer the user’s query.
tool function:Initialize the VoiceAgent
Create a new The agent handles the entire voice interaction lifecycle: text streaming, tool calling, and speech synthesis.
VoiceAgent instance with your desired configuration:Set up event listeners
Listen to events to track the agent’s activity and handle responses:The SDK emits events at every stage of processing, giving you full visibility into the agent’s behavior.
Send your first message
Send a text message to the agent and get a streaming response:The agent will:
- Add your message to the conversation history
- Stream text tokens in real-time via
chunk:text_deltaevents - Detect that it needs weather data and call the
getWeathertool - Generate a response incorporating the tool result
- Convert the response to speech in parallel chunks
- Emit audio chunks as they’re generated
Optional: Connect to WebSocket for real-time voice
For real-time voice interaction, connect to a WebSocket server:The WebSocket protocol supports:
- Text transcripts from browser speech recognition
- Audio data for server-side transcription with Whisper
- Interruptions to cancel ongoing responses (barge-in)
Expected Output
When you run the code above, you’ll see output like this:Complete Example
Here’s the full working example you can copy and run:Next Steps
Now that you have a working voice agent, explore more advanced features:Configuration Guide
Fine-tune streaming speech, memory limits, and audio settings
Events Reference
Complete list of all events and their payloads
VoiceAgent API
Full API reference for methods and properties
Examples
More examples including WebSocket servers and browser clients