Skip to main content
Both VoiceAgent and VideoAgent extend Node.js EventEmitter and emit events throughout the lifecycle of conversation processing. This page documents all available events, their payloads, and when they’re triggered.

Event Categories

Text Events

User input and LLM text streaming

Speech Events

TTS generation and audio output

Tool Events

Tool invocations and results

Connection Events

WebSocket lifecycle

Video Events

Frame capture and processing (VideoAgent only)

Error Events

Errors and warnings

Text Events

Events related to text input and LLM streaming output.

text

Emitted when user input is received or when the full assistant response is ready. Payload:
{
  role: "user" | "assistant";
  text: string;
  hasImage?: boolean; // VideoAgent only
}
When:
  • role: "user" - After user sends text input or audio is transcribed
  • role: "assistant" - After LLM completes full response
Example:
agent.on("text", ({ role, text }) => {
  const prefix = role === "user" ? "πŸ‘€" : "πŸ€–";
  console.log(`${prefix} ${text}`);
});

chunk:text_delta

Emitted for each streaming text token from the LLM. Payload:
{
  id: string;
  text: string;
}
When: During LLM streaming response, once per token. Example:
agent.on("chunk:text_delta", ({ text }) => {
  process.stdout.write(text); // Stream to console
});

chunk:reasoning_delta

Emitted for each reasoning token (for models that support reasoning). Payload:
{
  id: string;
  text: string;
}
When: During reasoning phase of models like o1. Example:
agent.on("chunk:reasoning_delta", ({ text }) => {
  console.log("[Reasoning]", text);
});

Speech Events

Events related to text-to-speech generation and audio output.

speech_start

Emitted when TTS generation begins. Payload:
{
  streaming: boolean;
}
When: When the first text chunk is sent to the speech model. Example:
agent.on("speech_start", ({ streaming }) => {
  console.log(`Speech started (streaming: ${streaming})`);
});

speech_complete

Emitted when all TTS chunks have been sent. Payload:
{
  streaming: boolean;
}
When: After all speech chunks are generated and sent to the client. Example:
agent.on("speech_complete", ({ streaming }) => {
  console.log("All speech chunks complete");
});

speech_interrupted

Emitted when speech generation is cancelled. Payload:
{
  reason: string;
}
Common reasons:
  • "interrupted" - Manual interruption via interruptSpeech()
  • "user_speaking" - User started speaking (barge-in)
  • "client_request" - Client sent interrupt message
  • "disconnected" - WebSocket disconnected
Example:
agent.on("speech_interrupted", ({ reason }) => {
  console.log(`Speech interrupted: ${reason}`);
});

speech_chunk_queued

Emitted when a text chunk enters the TTS queue. Payload:
{
  id: number;
  text: string;
}
When: After text is split into chunks and queued for TTS generation. Example:
agent.on("speech_chunk_queued", ({ id, text }) => {
  console.log(`Chunk ${id} queued: "${text.substring(0, 50)}..."`);
});

audio_chunk

Emitted when a single TTS chunk is ready and sent. Payload:
{
  chunkId: number;
  data: string;        // Base64-encoded audio
  format: string;      // e.g., "mp3", "opus"
  text: string;        // Original text for this chunk
  uint8Array: Uint8Array; // Raw audio bytes
}
When: After each chunk is generated by the speech model. Example:
agent.on("audio_chunk", ({ chunkId, format, uint8Array, text }) => {
  console.log(`Chunk ${chunkId}: ${uint8Array.length} bytes (${format})`);
  // Save or stream audio
  fs.writeFileSync(`chunk-${chunkId}.${format}`, uint8Array);
});

audio

Emitted for full non-streaming TTS audio. Payload:
{
  data: string;        // Base64-encoded audio
  format: string;      // e.g., "mp3", "opus"
  uint8Array: Uint8Array; // Raw audio bytes
}
When: When using generateAndSendSpeechFull() instead of streaming. Example:
agent.on("audio", ({ uint8Array, format }) => {
  fs.writeFileSync(`response.${format}`, uint8Array);
});

Tool Events

Events related to AI SDK tool invocations.

chunk:tool_call

Emitted when a tool invocation is detected during streaming. Payload:
{
  toolName: string;
  toolCallId: string;
  input: any; // Tool input parameters
}
When: When LLM decides to call a tool. Example:
agent.on("chunk:tool_call", ({ toolName, toolCallId, input }) => {
  console.log(`Calling tool: ${toolName}`, input);
});

tool_result

Emitted when a tool execution completes. Payload:
{
  name: string;
  toolCallId: string;
  result: any; // Tool output
}
When: After tool’s execute function finishes. Example:
agent.on("tool_result", ({ name, toolCallId, result }) => {
  console.log(`Tool ${name} result:`, result);
});

Transcription Events

Events related to audio transcription.

transcription

Emitted when audio is successfully transcribed to text. Payload:
{
  text: string;
  language?: string;
}
When: After transcribeAudio() or audio WebSocket message is processed. Example:
agent.on("transcription", ({ text, language }) => {
  console.log(`Transcribed (${language || "unknown"}): ${text}`);
});

audio_received

Emitted when raw audio input is received before transcription. Payload:
{
  size: number; // Audio size in bytes
}
When: After audio WebSocket message arrives, before transcription starts. Example:
agent.on("audio_received", ({ size }) => {
  console.log(`Received ${(size / 1024).toFixed(1)} KB of audio`);
});

History Events

Events related to conversation memory management.

history_cleared

Emitted when conversation history is manually cleared. Payload: None When: After clearHistory() is called. Example:
agent.on("history_cleared", () => {
  console.log("Conversation history cleared");
});

history_trimmed

Emitted when old messages are automatically removed from history. Payload:
{
  removedCount: number;
  reason: "max_messages" | "max_chars";
}
When: When history exceeds maxMessages or maxTotalChars limits. Example:
agent.on("history_trimmed", ({ removedCount, reason }) => {
  console.log(`Trimmed ${removedCount} messages (reason: ${reason})`);
});

Connection Events

WebSocket lifecycle events.

connected

Emitted when WebSocket connection is established. Payload: None When: After connect() succeeds or handleSocket() is called. Example:
agent.on("connected", () => {
  console.log("WebSocket connected");
});

disconnected

Emitted when WebSocket connection closes. Payload: None When: When socket closes (client disconnect, network error, disconnect() called). Example:
agent.on("disconnected", () => {
  console.log("WebSocket disconnected");
  agent.destroy(); // Clean up resources
});

Video Events (VideoAgent Only)

Events specific to VideoAgent for video frame processing.

frame_received

Emitted when a video frame is received and processed. Payload:
{
  sequence: number;
  timestamp: number;
  triggerReason: FrameTriggerReason;
  size: number; // Frame size in bytes
  dimensions: {
    width: number;
    height: number;
  };
}
When: After frame passes validation and is added to context buffer. Example:
videoAgent.on("frame_received", ({ sequence, triggerReason, size, dimensions }) => {
  console.log(`Frame ${sequence} received (${triggerReason}): ${dimensions.width}x${dimensions.height}, ${(size / 1024).toFixed(1)} KB`);
});

frame_requested

Emitted when the agent requests the client to capture a frame. Payload:
{
  reason: FrameTriggerReason;
}
When: After requestFrameCapture() is called. Example:
videoAgent.on("frame_requested", ({ reason }) => {
  console.log(`Requesting frame capture: ${reason}`);
});

client_ready

Emitted when client connects and reports capabilities. Payload:
any // Client-reported capabilities object
When: After receiving client_ready WebSocket message. Example:
videoAgent.on("client_ready", (capabilities) => {
  console.log("Client ready with capabilities:", capabilities);
});

config_changed

Emitted when video agent configuration is updated. Payload:
VideoAgentConfig
When: After updateConfig() is called. Example:
videoAgent.on("config_changed", (config) => {
  console.log("Config updated:", config);
});

Error Events

Error and warning events.

error

Emitted when an error occurs in any subsystem. Payload:
Error
Common sources:
  • LLM stream errors
  • TTS generation failures
  • Transcription errors
  • WebSocket errors
  • Invalid input (oversized audio/frames)
Example:
agent.on("error", (error) => {
  console.error("Agent error:", error.message);
  // Handle error gracefully
});

warning

Emitted for non-fatal issues that don’t stop processing. Payload:
string // Warning message
Common warnings:
  • Empty transcript message
  • Invalid audio message
  • Empty video frame
Example:
agent.on("warning", (message) => {
  console.warn("Warning:", message);
});

Listening to Events

Basic Event Handling

const agent = new VoiceAgent({
  model: openai("gpt-4o"),
  // ... other options
});

// Text streaming
agent.on("chunk:text_delta", ({ text }) => {
  process.stdout.write(text);
});

// Speech events
agent.on("speech_start", () => console.log("πŸ”Š Speaking..."));
agent.on("speech_complete", () => console.log("βœ… Done speaking"));

// Tool usage
agent.on("chunk:tool_call", ({ toolName, input }) => {
  console.log(`πŸ”§ Calling ${toolName}:`, input);
});

agent.on("tool_result", ({ name, result }) => {
  console.log(`βœ“ ${name} returned:`, result);
});

// Errors
agent.on("error", (error) => {
  console.error("❌ Error:", error);
});

WebSocket Integration

import { WebSocketServer } from "ws";

const wss = new WebSocketServer({ port: 8080 });

wss.on("connection", (socket) => {
  const agent = new VoiceAgent({
    model: openai("gpt-4o"),
    transcriptionModel: openai.transcription("whisper-1"),
    speechModel: openai.speech("gpt-4o-mini-tts"),
  });

  agent.handleSocket(socket);

  // Forward events to client
  agent.on("text", (data) => {
    socket.send(JSON.stringify({ type: "text", ...data }));
  });

  agent.on("audio_chunk", ({ chunkId, data, format }) => {
    socket.send(JSON.stringify({ 
      type: "audio_chunk", 
      chunkId, 
      data, 
      format 
    }));
  });

  // Cleanup on disconnect
  agent.on("disconnected", () => {
    agent.destroy();
  });
});

VideoAgent Events

const videoAgent = new VideoAgent({
  model: openai("gpt-4o"), // Vision model
  speechModel: openai.speech("gpt-4o-mini-tts"),
});

// Frame events
videoAgent.on("frame_received", ({ sequence, triggerReason }) => {
  console.log(`πŸ“Έ Frame ${sequence} (${triggerReason})`);
});

videoAgent.on("frame_requested", ({ reason }) => {
  console.log(`πŸŽ₯ Requesting frame: ${reason}`);
});

// Multimodal text events include image context
videoAgent.on("text", ({ role, text, hasImage }) => {
  console.log(`${role}: ${text} ${hasImage ? "πŸ“·" : ""}`);
});

Event Timing Diagram

Typical event flow for a user query:
1. User Input
   └─> text (role: "user")

2. LLM Streaming
   β”œβ”€> chunk:text_delta (multiple)
   β”œβ”€> chunk:tool_call (if tools used)
   └─> tool_result (if tools used)

3. Speech Generation
   β”œβ”€> speech_chunk_queued (multiple)
   β”œβ”€> speech_start
   β”œβ”€> audio_chunk (multiple)
   └─> speech_complete

4. Response Complete
   └─> text (role: "assistant")

Types & Interfaces

Type definitions for event payloads

VoiceAgent

Voice agent class reference

VideoAgent

Video agent class reference

WebSocket Protocol

Complete WebSocket message protocol

Build docs developers (and LLMs) love