Skip to main content

Overview

Client events are messages sent from the frontend to the Unmute backend server. These events control session configuration and stream audio data.

session.update

Configure the conversation session, including voice selection and conversation instructions. The backend requires this event before it begins processing. Send this immediately after connecting.

Parameters

type
string
required
Must be "session.update"
event_id
string
required
Unique event identifier (auto-generated)
session
object
required
Session configuration object
session.instructions
object
Conversation instructions (Unmute extension)
session.instructions.character
string
Character personality and behavior
session.instructions.scenario
string
Conversation scenario or context
session.voice
string
Voice identifier for text-to-speech. Get available voices from /v1/voices endpoint.
session.allow_recording
boolean
required
Whether to allow recording of the conversation. Set to false to disable recording.

Example

{
  "type": "session.update",
  "event_id": "event_ABC123xyz",
  "session": {
    "instructions": {
      "character": "You are a helpful AI assistant.",
      "scenario": "Casual conversation"
    },
    "voice": "default",
    "allow_recording": false
  }
}

Response

The server responds with a session.updated event confirming the configuration.

input_audio_buffer.append

Stream audio data from the user’s microphone to the server.

Parameters

type
string
required
Must be "input_audio_buffer.append"
event_id
string
required
Unique event identifier (auto-generated)
audio
string
required
Base64-encoded Opus audio dataAudio Specifications:
  • Codec: Opus
  • Sample Rate: 24 kHz
  • Channels: Mono
  • Encoding: Base64 string

Example

{
  "type": "input_audio_buffer.append",
  "event_id": "event_XYZ789abc",
  "audio": "T2dnUwACAAAAAAAAAADqnjMlAAAAAP4lQ6gBE09w..." 
}

Implementation Notes

  • The server decodes the base64 audio and processes it through an Opus stream reader
  • The first packet must have the “beginning of stream” bit set (bit 2 in byte 5)
  • Audio is processed in real-time for speech detection and transcription
  • The server may send input_audio_buffer.speech_started when speech is detected

JavaScript Example

// Capture microphone audio
const stream = await navigator.mediaDevices.getUserMedia({ 
  audio: {
    sampleRate: 24000,
    channelCount: 1
  } 
});

const mediaRecorder = new MediaRecorder(stream, {
  mimeType: 'audio/webm;codecs=opus',
  audioBitsPerSecond: 16000
});

mediaRecorder.ondataavailable = (event) => {
  // Convert to base64 and send
  const reader = new FileReader();
  reader.onloadend = () => {
    const base64Audio = reader.result.split(',')[1];
    ws.send(JSON.stringify({
      type: 'input_audio_buffer.append',
      audio: base64Audio
    }));
  };
  reader.readAsDataURL(event.data);
};

mediaRecorder.start(100); // Send every 100ms

Error Handling

If the client sends invalid messages, the server responds with an error event:

Invalid JSON

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid JSON: Expecting value: line 1 column 1 (char 0)"
  }
}

Invalid Message Structure

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid message",
    "details": [
      {
        "type": "missing",
        "loc": ["session"],
        "msg": "Field required"
      }
    ]
  }
}

Next Steps

Server Events

Learn about events sent from server to client

Session Management

Configure voice and conversation settings

Build docs developers (and LLMs) love