Skip to main content

Overview

Server events are messages sent from the Unmute backend to the client. These events stream audio responses, transcriptions, and status updates.

session.updated

Confirms that the session configuration was successfully updated.

Response Fields

type
string
required
Always "session.updated"
event_id
string
required
Unique event identifier
session
object
required
The updated session configuration (mirrors the session.update request)

Example

{
  "type": "session.updated",
  "event_id": "event_DEF456uvw",
  "session": {
    "instructions": {
      "character": "You are a helpful AI assistant.",
      "scenario": "Casual conversation"
    },
    "voice": "default",
    "allow_recording": false
  }
}

response.created

Indicates that the assistant has started generating a response.

Response Fields

type
string
required
Always "response.created"
event_id
string
required
Unique event identifier
response
object
required
Response metadata
response.object
string
required
Always "realtime.response"
response.status
string
required
Response status. One of: "in_progress", "completed", "cancelled", "failed", "incomplete"
response.voice
string
required
Voice identifier being used for this response
response.chat_history
array
default:"[]"
Conversation history (array of message objects)

Example

{
  "type": "response.created",
  "event_id": "event_GHI789rst",
  "response": {
    "object": "realtime.response",
    "status": "in_progress",
    "voice": "default",
    "chat_history": []
  }
}

response.audio.delta

Streams generated speech audio to the client.

Response Fields

type
string
required
Always "response.audio.delta"
event_id
string
required
Unique event identifier
delta
string
required
Base64-encoded Opus audio chunkAudio Specifications:
  • Codec: Opus
  • Sample Rate: 24 kHz
  • Channels: Mono
  • Encoding: Base64 string

Example

{
  "type": "response.audio.delta",
  "event_id": "event_JKL012mno",
  "delta": "T2dnUwACAAAAAAAAAAAljMlAAAAABCmR7kBE09w..."
}

Implementation Notes

  • Audio chunks are sent as they become available from the text-to-speech system
  • Due to Opus buffering, not every PCM chunk results in output
  • Chunks should be decoded and played in sequence

JavaScript Example

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  if (message.type === 'response.audio.delta') {
    // Decode base64 audio
    const audioData = atob(message.delta);
    const audioBytes = new Uint8Array(audioData.length);
    for (let i = 0; i < audioData.length; i++) {
      audioBytes[i] = audioData.charCodeAt(i);
    }
    
    // Play audio using Web Audio API or audio element
    playOpusAudio(audioBytes);
  }
};

response.audio.done

Indicates that audio streaming for the current response has completed.

Response Fields

type
string
required
Always "response.audio.done"
event_id
string
required
Unique event identifier

Example

{
  "type": "response.audio.done",
  "event_id": "event_MNO345pqr"
}

response.text.delta

Streams the text being generated (for display or debugging).

Response Fields

type
string
required
Always "response.text.delta"
event_id
string
required
Unique event identifier
delta
string
required
Text chunk being generated

Example

{
  "type": "response.text.delta",
  "event_id": "event_PQR678stu",
  "delta": "Hello! How can I "
}

response.text.done

Indicates that text generation is complete and provides the full text.

Response Fields

type
string
required
Always "response.text.done"
event_id
string
required
Unique event identifier
text
string
required
Complete generated text

Example

{
  "type": "response.text.done",
  "event_id": "event_STU901vwx",
  "text": "Hello! How can I help you today?"
}

conversation.item.input_audio_transcription.delta

Streams real-time transcription of user speech.

Response Fields

type
string
required
Always "conversation.item.input_audio_transcription.delta"
event_id
string
required
Unique event identifier
delta
string
required
Transcription text chunk
start_time
number
required
Timestamp when speech started (Unmute extension)

Example

{
  "type": "conversation.item.input_audio_transcription.delta",
  "event_id": "event_VWX234yza",
  "delta": "Hello, can you",
  "start_time": 1234567890.123
}

input_audio_buffer.speech_started

Indicates that speech was detected in the user’s audio input. Note: Based on speech-to-text detection, not voice activity detection (VAD). This ensures the event is only sent when actual speech is transcribed.

Response Fields

type
string
required
Always "input_audio_buffer.speech_started"
event_id
string
required
Unique event identifier

Example

{
  "type": "input_audio_buffer.speech_started",
  "event_id": "event_YZA567bcd"
}

input_audio_buffer.speech_stopped

Indicates that a pause was detected in the user’s audio input. Note: Based on voice activity detection (VAD).

Response Fields

type
string
required
Always "input_audio_buffer.speech_stopped"
event_id
string
required
Unique event identifier

Example

{
  "type": "input_audio_buffer.speech_stopped",
  "event_id": "event_BCD890efg"
}

unmute.interrupted_by_vad

Indicates that the voice activity detector interrupted the assistant’s response generation because the user started speaking. Unmute Extension: This event is specific to Unmute.

Response Fields

type
string
required
Always "unmute.interrupted_by_vad"
event_id
string
required
Unique event identifier

Example

{
  "type": "unmute.interrupted_by_vad",
  "event_id": "event_EFG123hij"
}

unmute.response.text.delta.ready

Indicates that a text delta is ready for processing. Unmute Extension: This event is specific to Unmute.

Response Fields

type
string
required
Always "unmute.response.text.delta.ready"
event_id
string
required
Unique event identifier
delta
string
required
Text chunk that is ready

Example

{
  "type": "unmute.response.text.delta.ready",
  "event_id": "event_HIJ456klm",
  "delta": "help you today?"
}

unmute.response.audio.delta.ready

Indicates that an audio delta is ready with sample count information. Unmute Extension: This event is specific to Unmute.

Response Fields

type
string
required
Always "unmute.response.audio.delta.ready"
event_id
string
required
Unique event identifier
number_of_samples
integer
required
Number of audio samples in this chunk

Example

{
  "type": "unmute.response.audio.delta.ready",
  "event_id": "event_KLM789nop",
  "number_of_samples": 480
}

unmute.additional_outputs

Provides additional debug or metadata outputs from the system. Unmute Extension: This event is specific to Unmute and used for debugging.

Response Fields

type
string
required
Always "unmute.additional_outputs"
event_id
string
required
Unique event identifier
args
any
required
Additional output data (structure varies)

Example

{
  "type": "unmute.additional_outputs",
  "event_id": "event_NOP012qrs",
  "args": {
    "debug_info": "Processing latency: 45ms"
  }
}

error

Reports errors during the WebSocket session.

Response Fields

type
string
required
Always "error"
event_id
string
required
Unique event identifier
error
object
required
Error details
error.type
string
required
Error type (e.g., "invalid_request_error", "fatal")
error.code
string
Error code (optional)
error.message
string
required
Human-readable error message
error.param
string
Parameter that caused the error (optional)
error.details
any
Additional error details (Unmute extension, optional)

Example: Invalid JSON

{
  "type": "error",
  "event_id": "event_QRS345tuv",
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid JSON: Expecting value: line 1 column 1 (char 0)"
  }
}

Example: Fatal Error

{
  "type": "error",
  "event_id": "event_TUV678wxy",
  "error": {
    "type": "fatal",
    "message": "Too many people are connected to service 'tts'. Please try again later."
  }
}
Note: Fatal errors typically result in the WebSocket connection being closed by the server.

Next Steps

Client Events

Events sent from client to server

Session Management

Configure voice and conversation settings

Build docs developers (and LLMs) love