Version 0.1.0
Released July 15, 2025Added
Conversation History Limits
Newhistory option with maxMessages (default 100) and maxTotalChars (default unlimited) to prevent unbounded memory growth. Oldest messages are trimmed in pairs to preserve user/assistant turn structure.
history_trimmed event when messages are evicted.
Audio Input Size Validation
NewmaxAudioInputSize option (default 10 MB). Oversized or empty audio payloads are rejected early with an error / warning event instead of being forwarded to the transcription model.
Serial Input Queue
sendText(), WebSocket transcript messages, and transcribed audio are now queued and processed one at a time. This prevents race conditions where concurrent calls could corrupt conversationHistory or interleave streaming output.
LLM Stream Cancellation
AnAbortController is now threaded into streamText() via abortSignal. Barge-in, disconnect, and explicit interrupts abort the LLM stream immediately (saving tokens) instead of only cancelling TTS.
interruptCurrentResponse(reason)
New public method that aborts both the LLM stream and ongoing speech in a single call. WebSocket barge-in (transcript / audio / interrupt messages) now uses this instead of interruptSpeech() alone.
destroy() Method
Permanently tears down the agent, releasing the socket, clearing history and tools, and removing all event listeners. A destroyed getter is also exposed. Any subsequent method call throws.
history_trimmed Event
Emitted with { removedCount, reason } when the sliding-window trims old messages.
Input Validation
sendText("") now throws, and incoming WebSocket transcript / audio messages are validated before processing.
Changed
disconnect() is Now Full Cleanup
Aborts in-flight LLM and TTS streams, clears the speech queue, rejects pending queued inputs, and removes socket listeners before closing. Previously it only called socket.close().
connect() and handleSocket() are Idempotent
Calling either when a socket is already attached will cleanly tear down the old connection first instead of leaking it.
sendWebSocketMessage() is Resilient
Checks socket.readyState and wraps send() in a try/catch so a socket that closes mid-send does not throw an unhandled exception.
Speech Queue Completion Uses Promise
processUserInput now awaits a speechQueueDonePromise instead of busy-wait polling (while (queue.length) { await sleep(100) }), reducing CPU waste and eliminating a race window.
interruptSpeech() Resolves Speech-Done Promise
So processUserInput can proceed immediately after a barge-in instead of potentially hanging.
WebSocket Message Handler Uses if/else if
Prevents a single message from accidentally matching multiple type branches.
Chunk ID Wraps at Number.MAX_SAFE_INTEGER
Avoids unbounded counter growth in very long-running sessions.
processUserInput Catch Block Cleans Up Speech State
On stream error, the pending text buffer is cleared and any in-progress speech is interrupted, so the agent does not get stuck in a broken state.
WebSocket Close Handler Calls cleanupOnDisconnect()
Aborts LLM + TTS, clears queues, and rejects pending input promises.
Fixed
- Typo in JSDoc:
"Process text deltra"→"Process text delta"
Version 0.0.1
Released July 14, 2025Added
- Initial release of Voice Agent AI SDK
- Streaming text generation via AI SDK
streamText - Multi-step tool calling with
stopWhen - Chunked streaming TTS with parallel generation and barge-in support
- Audio transcription via AI SDK
experimental_transcribe - WebSocket transport with full stream/tool/speech lifecycle events
- Browser voice client example (
example/voice-client.html)
This was the initial release of the SDK, providing the foundation for voice-enabled AI agents with streaming capabilities.