Events Reference - Voice Agent AI SDK

Both VoiceAgent and VideoAgent extend Node.js EventEmitter and emit events throughout the lifecycle of conversation processing. This page documents all available events, their payloads, and when they’re triggered.

Event Categories

Text Events

User input and LLM text streaming

Speech Events

TTS generation and audio output

Tool Events

Tool invocations and results

Connection Events

WebSocket lifecycle

Video Events

Frame capture and processing (VideoAgent only)

Error Events

Errors and warnings

Text Events

Events related to text input and LLM streaming output.

text

Emitted when user input is received or when the full assistant response is ready. Payload:

{
  role: "user" | "assistant";
  text: string;
  hasImage?: boolean; // VideoAgent only
}

When:

role: "user" - After user sends text input or audio is transcribed
role: "assistant" - After LLM completes full response

Example:

agent.on("text", ({ role, text }) => {
  const prefix = role === "user" ? "👤" : "🤖";
  console.log(`${prefix} ${text}`);
});

chunk:text_delta

Emitted for each streaming text token from the LLM. Payload:

{
  id: string;
  text: string;
}

When: During LLM streaming response, once per token. Example:

agent.on("chunk:text_delta", ({ text }) => {
  process.stdout.write(text); // Stream to console
});

chunk:reasoning_delta

Emitted for each reasoning token (for models that support reasoning). Payload:

{
  id: string;
  text: string;
}

When: During reasoning phase of models like o1. Example:

agent.on("chunk:reasoning_delta", ({ text }) => {
  console.log("[Reasoning]", text);
});

Speech Events

Events related to text-to-speech generation and audio output.

speech_start

Emitted when TTS generation begins. Payload:

{
  streaming: boolean;
}

When: When the first text chunk is sent to the speech model. Example:

agent.on("speech_start", ({ streaming }) => {
  console.log(`Speech started (streaming: ${streaming})`);
});

speech_complete

Emitted when all TTS chunks have been sent. Payload:

{
  streaming: boolean;
}

When: After all speech chunks are generated and sent to the client. Example:

agent.on("speech_complete", ({ streaming }) => {
  console.log("All speech chunks complete");
});

speech_interrupted

Emitted when speech generation is cancelled. Payload:

{
  reason: string;
}

Common reasons:

"interrupted" - Manual interruption via interruptSpeech()
"user_speaking" - User started speaking (barge-in)
"client_request" - Client sent interrupt message
"disconnected" - WebSocket disconnected

Example:

agent.on("speech_interrupted", ({ reason }) => {
  console.log(`Speech interrupted: ${reason}`);
});

speech_chunk_queued

Emitted when a text chunk enters the TTS queue. Payload:

{
  id: number;
  text: string;
}

When: After text is split into chunks and queued for TTS generation. Example:

agent.on("speech_chunk_queued", ({ id, text }) => {
  console.log(`Chunk ${id} queued: "${text.substring(0, 50)}..."`);
});

audio_chunk

Emitted when a single TTS chunk is ready and sent. Payload:

{
  chunkId: number;
  data: string;        // Base64-encoded audio
  format: string;      // e.g., "mp3", "opus"
  text: string;        // Original text for this chunk
  uint8Array: Uint8Array; // Raw audio bytes
}

When: After each chunk is generated by the speech model. Example:

agent.on("audio_chunk", ({ chunkId, format, uint8Array, text }) => {
  console.log(`Chunk ${chunkId}: ${uint8Array.length} bytes (${format})`);
  // Save or stream audio
  fs.writeFileSync(`chunk-${chunkId}.${format}`, uint8Array);
});

audio

Emitted for full non-streaming TTS audio. Payload:

{
  data: string;        // Base64-encoded audio
  format: string;      // e.g., "mp3", "opus"
  uint8Array: Uint8Array; // Raw audio bytes
}

When: When using generateAndSendSpeechFull() instead of streaming. Example:

agent.on("audio", ({ uint8Array, format }) => {
  fs.writeFileSync(`response.${format}`, uint8Array);
});

Tool Events

Events related to AI SDK tool invocations.

chunk:tool_call

Emitted when a tool invocation is detected during streaming. Payload:

{
  toolName: string;
  toolCallId: string;
  input: any; // Tool input parameters
}

When: When LLM decides to call a tool. Example:

agent.on("chunk:tool_call", ({ toolName, toolCallId, input }) => {
  console.log(`Calling tool: ${toolName}`, input);
});

tool_result

Emitted when a tool execution completes. Payload:

{
  name: string;
  toolCallId: string;
  result: any; // Tool output
}

When: After tool’s execute function finishes. Example:

agent.on("tool_result", ({ name, toolCallId, result }) => {
  console.log(`Tool ${name} result:`, result);
});

Transcription Events

Events related to audio transcription.

transcription

Emitted when audio is successfully transcribed to text. Payload:

{
  text: string;
  language?: string;
}

When: After transcribeAudio() or audio WebSocket message is processed. Example:

agent.on("transcription", ({ text, language }) => {
  console.log(`Transcribed (${language || "unknown"}): ${text}`);
});

audio_received

Emitted when raw audio input is received before transcription. Payload:

{
  size: number; // Audio size in bytes
}

When: After audio WebSocket message arrives, before transcription starts. Example:

agent.on("audio_received", ({ size }) => {
  console.log(`Received ${(size / 1024).toFixed(1)} KB of audio`);
});

History Events

Events related to conversation memory management.

history_cleared

Emitted when conversation history is manually cleared. Payload: None When: After clearHistory() is called. Example:

agent.on("history_cleared", () => {
  console.log("Conversation history cleared");
});

history_trimmed

Emitted when old messages are automatically removed from history. Payload:

{
  removedCount: number;
  reason: "max_messages" | "max_chars";
}

When: When history exceeds maxMessages or maxTotalChars limits. Example:

agent.on("history_trimmed", ({ removedCount, reason }) => {
  console.log(`Trimmed ${removedCount} messages (reason: ${reason})`);
});

Connection Events

WebSocket lifecycle events.

connected

Emitted when WebSocket connection is established. Payload: None When: After connect() succeeds or handleSocket() is called. Example:

agent.on("connected", () => {
  console.log("WebSocket connected");
});

disconnected

Emitted when WebSocket connection closes. Payload: None When: When socket closes (client disconnect, network error, disconnect() called). Example:

agent.on("disconnected", () => {
  console.log("WebSocket disconnected");
  agent.destroy(); // Clean up resources
});

Video Events (VideoAgent Only)

Events specific to VideoAgent for video frame processing.

frame_received

Emitted when a video frame is received and processed. Payload:

{
  sequence: number;
  timestamp: number;
  triggerReason: FrameTriggerReason;
  size: number; // Frame size in bytes
  dimensions: {
    width: number;
    height: number;
  };
}

When: After frame passes validation and is added to context buffer. Example:

videoAgent.on("frame_received", ({ sequence, triggerReason, size, dimensions }) => {
  console.log(`Frame ${sequence} received (${triggerReason}): ${dimensions.width}x${dimensions.height}, ${(size / 1024).toFixed(1)} KB`);
});

frame_requested

Emitted when the agent requests the client to capture a frame. Payload:

{
  reason: FrameTriggerReason;
}

When: After requestFrameCapture() is called. Example:

videoAgent.on("frame_requested", ({ reason }) => {
  console.log(`Requesting frame capture: ${reason}`);
});

client_ready

Emitted when client connects and reports capabilities. Payload:

any // Client-reported capabilities object

When: After receiving client_ready WebSocket message. Example:

videoAgent.on("client_ready", (capabilities) => {
  console.log("Client ready with capabilities:", capabilities);
});

config_changed

Emitted when video agent configuration is updated. Payload:

VideoAgentConfig

When: After updateConfig() is called. Example:

videoAgent.on("config_changed", (config) => {
  console.log("Config updated:", config);
});

Error Events

Error and warning events.

error

Emitted when an error occurs in any subsystem. Payload:

Error

Common sources:

LLM stream errors
TTS generation failures
Transcription errors
WebSocket errors
Invalid input (oversized audio/frames)

Example:

agent.on("error", (error) => {
  console.error("Agent error:", error.message);
  // Handle error gracefully
});

warning

Emitted for non-fatal issues that don’t stop processing. Payload:

string // Warning message

Common warnings:

Empty transcript message
Invalid audio message
Empty video frame

Example:

agent.on("warning", (message) => {
  console.warn("Warning:", message);
});

Listening to Events

Basic Event Handling

const agent = new VoiceAgent({
  model: openai("gpt-4o"),
  // ... other options
});

// Text streaming
agent.on("chunk:text_delta", ({ text }) => {
  process.stdout.write(text);
});

// Speech events
agent.on("speech_start", () => console.log("🔊 Speaking..."));
agent.on("speech_complete", () => console.log("✅ Done speaking"));

// Tool usage
agent.on("chunk:tool_call", ({ toolName, input }) => {
  console.log(`🔧 Calling ${toolName}:`, input);
});

agent.on("tool_result", ({ name, result }) => {
  console.log(`✓ ${name} returned:`, result);
});

// Errors
agent.on("error", (error) => {
  console.error("❌ Error:", error);
});

WebSocket Integration

import { WebSocketServer } from "ws";

const wss = new WebSocketServer({ port: 8080 });

wss.on("connection", (socket) => {
  const agent = new VoiceAgent({
    model: openai("gpt-4o"),
    transcriptionModel: openai.transcription("whisper-1"),
    speechModel: openai.speech("gpt-4o-mini-tts"),
  });

  agent.handleSocket(socket);

  // Forward events to client
  agent.on("text", (data) => {
    socket.send(JSON.stringify({ type: "text", ...data }));
  });

  agent.on("audio_chunk", ({ chunkId, data, format }) => {
    socket.send(JSON.stringify({ 
      type: "audio_chunk", 
      chunkId, 
      data, 
      format 
    }));
  });

  // Cleanup on disconnect
  agent.on("disconnected", () => {
    agent.destroy();
  });
});

VideoAgent Events

const videoAgent = new VideoAgent({
  model: openai("gpt-4o"), // Vision model
  speechModel: openai.speech("gpt-4o-mini-tts"),
});

// Frame events
videoAgent.on("frame_received", ({ sequence, triggerReason }) => {
  console.log(`📸 Frame ${sequence} (${triggerReason})`);
});

videoAgent.on("frame_requested", ({ reason }) => {
  console.log(`🎥 Requesting frame: ${reason}`);
});

// Multimodal text events include image context
videoAgent.on("text", ({ role, text, hasImage }) => {
  console.log(`${role}: ${text} ${hasImage ? "📷" : ""}`);
});

Event Timing Diagram

Typical event flow for a user query:

1. User Input
   └─> text (role: "user")

2. LLM Streaming
   ├─> chunk:text_delta (multiple)
   ├─> chunk:tool_call (if tools used)
   └─> tool_result (if tools used)

3. Speech Generation
   ├─> speech_chunk_queued (multiple)
   ├─> speech_start
   ├─> audio_chunk (multiple)
   └─> speech_complete

4. Response Complete
   └─> text (role: "assistant")

Types & Interfaces

Type definitions for event payloads

VoiceAgent

Voice agent class reference

VideoAgent

Video agent class reference

WebSocket Protocol

Complete WebSocket message protocol

Agents

Core Managers

Types & Interfaces

Resources

​Event Categories

Text Events

Speech Events

Tool Events

Connection Events

Video Events

Error Events

​Text Events

​text

​chunk:text_delta

​chunk:reasoning_delta

​Speech Events

​speech_start

​speech_complete

​speech_interrupted

​speech_chunk_queued

​audio_chunk

​audio

​Tool Events

​chunk:tool_call

​tool_result

​Transcription Events

​transcription

​audio_received

​History Events

​history_cleared

​history_trimmed

​Connection Events

​connected

​disconnected

​Video Events (VideoAgent Only)

​frame_received

​frame_requested

​client_ready

​config_changed

​Error Events

​error

​warning

​Listening to Events

​Basic Event Handling

​WebSocket Integration

​VideoAgent Events

​Event Timing Diagram

​Related

Types & Interfaces

VoiceAgent

VideoAgent

WebSocket Protocol

Build docs developers (and LLMs) love

Event Categories

Text Events

text

chunk:text_delta

chunk:reasoning_delta

Speech Events

speech_start

speech_complete

speech_interrupted

speech_chunk_queued

audio_chunk

audio

Tool Events

chunk:tool_call

tool_result

Transcription Events

transcription

audio_received

History Events

history_cleared

history_trimmed

Connection Events

connected

disconnected

Video Events (VideoAgent Only)

frame_received

frame_requested

client_ready

config_changed

Error Events

error

warning

Listening to Events

Basic Event Handling

WebSocket Integration

VideoAgent Events

Event Timing Diagram

Related