Introduction
Unmute uses a WebSocket-based protocol inspired by the OpenAI Realtime API for real-time voice conversations. The protocol enables bidirectional streaming of audio, transcriptions, and conversation state.Connection Details
Endpoint
realtime
The realtime subprotocol is required. Clients must specify this when establishing the connection, otherwise the server will reject the connection.
Example Connection (JavaScript)
Message Format
All messages are JSON-encoded with a common structure:The event type identifier (e.g.,
session.update, response.audio.delta)Unique identifier for the event, automatically generated with format
event_ followed by 21 random alphanumeric charactersConnection Lifecycle
1. Health Check (Optional)
Before connecting, check server health:2. Establish WebSocket Connection
Connect to/v1/realtime with the realtime subprotocol.
3. Configure Session
Send asession.update event to configure the voice and instructions. The backend will not start processing until it receives this message.
4. Stream Audio
Begin sendinginput_audio_buffer.append events with microphone audio and receive response.audio.delta events with generated speech.
5. Graceful Shutdown
Close the WebSocket connection when done. The server handles cleanup automatically.Audio Format
All audio is encoded using the Opus codec with the following specifications:- Sample Rate: 24 kHz
- Channels: Mono
- Encoding: Base64-encoded Opus bytes
Rate Limiting
The server limits concurrent connections to 4 clients by default. If the limit is reached, the connection will be rejected with an error message.Error Handling
The server sendserror events when issues occur. See Server Events for details.
Common error scenarios:
- Invalid JSON format
- Unrecognized event types
- Service unavailability
- Internal server errors
Next Steps
Client Events
Events sent from client to server
Server Events
Events sent from server to client
Session Management
Configure voice and conversation settings