Overview
Server events are messages sent from the Unmute backend to the client. These events stream audio responses, transcriptions, and status updates.session.updated
Confirms that the session configuration was successfully updated.Response Fields
Always
"session.updated"Unique event identifier
The updated session configuration (mirrors the
session.update request)Example
response.created
Indicates that the assistant has started generating a response.Response Fields
Always
"response.created"Unique event identifier
Response metadata
Always
"realtime.response"Response status. One of:
"in_progress", "completed", "cancelled", "failed", "incomplete"Voice identifier being used for this response
Conversation history (array of message objects)
Example
response.audio.delta
Streams generated speech audio to the client.Response Fields
Always
"response.audio.delta"Unique event identifier
Base64-encoded Opus audio chunkAudio Specifications:
- Codec: Opus
- Sample Rate: 24 kHz
- Channels: Mono
- Encoding: Base64 string
Example
Implementation Notes
- Audio chunks are sent as they become available from the text-to-speech system
- Due to Opus buffering, not every PCM chunk results in output
- Chunks should be decoded and played in sequence
JavaScript Example
response.audio.done
Indicates that audio streaming for the current response has completed.Response Fields
Always
"response.audio.done"Unique event identifier
Example
response.text.delta
Streams the text being generated (for display or debugging).Response Fields
Always
"response.text.delta"Unique event identifier
Text chunk being generated
Example
response.text.done
Indicates that text generation is complete and provides the full text.Response Fields
Always
"response.text.done"Unique event identifier
Complete generated text
Example
conversation.item.input_audio_transcription.delta
Streams real-time transcription of user speech.Response Fields
Always
"conversation.item.input_audio_transcription.delta"Unique event identifier
Transcription text chunk
Timestamp when speech started (Unmute extension)
Example
input_audio_buffer.speech_started
Indicates that speech was detected in the user’s audio input. Note: Based on speech-to-text detection, not voice activity detection (VAD). This ensures the event is only sent when actual speech is transcribed.Response Fields
Always
"input_audio_buffer.speech_started"Unique event identifier
Example
input_audio_buffer.speech_stopped
Indicates that a pause was detected in the user’s audio input. Note: Based on voice activity detection (VAD).Response Fields
Always
"input_audio_buffer.speech_stopped"Unique event identifier
Example
unmute.interrupted_by_vad
Indicates that the voice activity detector interrupted the assistant’s response generation because the user started speaking. Unmute Extension: This event is specific to Unmute.Response Fields
Always
"unmute.interrupted_by_vad"Unique event identifier
Example
unmute.response.text.delta.ready
Indicates that a text delta is ready for processing. Unmute Extension: This event is specific to Unmute.Response Fields
Always
"unmute.response.text.delta.ready"Unique event identifier
Text chunk that is ready
Example
unmute.response.audio.delta.ready
Indicates that an audio delta is ready with sample count information. Unmute Extension: This event is specific to Unmute.Response Fields
Always
"unmute.response.audio.delta.ready"Unique event identifier
Number of audio samples in this chunk
Example
unmute.additional_outputs
Provides additional debug or metadata outputs from the system. Unmute Extension: This event is specific to Unmute and used for debugging.Response Fields
Always
"unmute.additional_outputs"Unique event identifier
Additional output data (structure varies)
Example
error
Reports errors during the WebSocket session.Response Fields
Always
"error"Unique event identifier
Error details
Error type (e.g.,
"invalid_request_error", "fatal")Error code (optional)
Human-readable error message
Parameter that caused the error (optional)
Additional error details (Unmute extension, optional)
Example: Invalid JSON
Example: Fatal Error
Next Steps
Client Events
Events sent from client to server
Session Management
Configure voice and conversation settings