WebSocket Overview

Introduction

Unmute uses a WebSocket-based protocol inspired by the OpenAI Realtime API for real-time voice conversations. The protocol enables bidirectional streaming of audio, transcriptions, and conversation state.

Connection Details

Endpoint

ws://localhost:8000/v1/realtime

WebSocket Subprotocol: realtime The realtime subprotocol is required. Clients must specify this when establishing the connection, otherwise the server will reject the connection.

Example Connection (JavaScript)

const ws = new WebSocket(
  'ws://localhost:8000/v1/realtime',
  'realtime'
);

ws.onopen = () => {
  console.log('Connected to Unmute');
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  console.log('Received:', message.type);
};

Message Format

All messages are JSON-encoded with a common structure:

{
  "type": "event.name",
  "event_id": "event_ABC123xyz",
  // ... additional fields specific to event type
}

type

string

required

The event type identifier (e.g., session.update, response.audio.delta)

event_id

string

required

Unique identifier for the event, automatically generated with format event_ followed by 21 random alphanumeric characters

Connection Lifecycle

1. Health Check (Optional)

Before connecting, check server health:

curl http://localhost:8000/v1/health

Response:

{
  "tts_up": true,
  "stt_up": true,
  "llm_up": true,
  "voice_cloning_up": true,
  "ok": true
}

2. Establish WebSocket Connection

Connect to /v1/realtime with the realtime subprotocol.

3. Configure Session

Send a session.update event to configure the voice and instructions. The backend will not start processing until it receives this message.

{
  "type": "session.update",
  "session": {
    "instructions": {
      "character": "helpful assistant",
      "scenario": "general conversation"
    },
    "voice": "default",
    "allow_recording": false
  }
}

4. Stream Audio

Begin sending input_audio_buffer.append events with microphone audio and receive response.audio.delta events with generated speech.

5. Graceful Shutdown

Close the WebSocket connection when done. The server handles cleanup automatically.

Audio Format

All audio is encoded using the Opus codec with the following specifications:

Sample Rate: 24 kHz
Channels: Mono
Encoding: Base64-encoded Opus bytes

Both client audio (sent to server) and server audio (received from server) use this format.

Rate Limiting

The server limits concurrent connections to 4 clients by default. If the limit is reached, the connection will be rejected with an error message.

Error Handling

The server sends error events when issues occur. See Server Events for details. Common error scenarios:

Invalid JSON format
Unrecognized event types
Service unavailability
Internal server errors

Next Steps

Client Events

Events sent from client to server

Server Events

Events sent from server to client

Session Management

Configure voice and conversation settings

WebSocket API

Python API

REST API

WebSocket Overview

Introduction

Connection Details

Endpoint

Example Connection (JavaScript)

Message Format

Connection Lifecycle

1. Health Check (Optional)

2. Establish WebSocket Connection

3. Configure Session

4. Stream Audio

5. Graceful Shutdown

Audio Format

Rate Limiting

Error Handling

Next Steps

Client Events

Server Events

Session Management

Build docs developers (and LLMs) love

WebSocket API

Python API

REST API

​Introduction

​Connection Details

​Endpoint

​Example Connection (JavaScript)

​Message Format

​Connection Lifecycle

​1. Health Check (Optional)

​2. Establish WebSocket Connection

​3. Configure Session

​4. Stream Audio

​5. Graceful Shutdown

​Audio Format

​Rate Limiting

​Error Handling

​Next Steps

Client Events

Server Events

Session Management

Build docs developers (and LLMs) love

Introduction

Connection Details

Endpoint

Example Connection (JavaScript)

Message Format

Connection Lifecycle

1. Health Check (Optional)

2. Establish WebSocket Connection

3. Configure Session

4. Stream Audio

5. Graceful Shutdown

Audio Format

Rate Limiting

Error Handling

Next Steps