Overview
Session management controls the behavior of the Unmute conversation system, including voice selection, conversation instructions, and recording preferences.
Session Configuration
Sessions are configured using the session.update client event. The backend requires this configuration before it begins processing audio.
Configuration Object
The session configuration is defined by the SessionConfig model:
Conversation instructions (Unmute extension to OpenAI Realtime API) Character personality and behavior description
Conversation scenario or context
Voice identifier for text-to-speech synthesis. Must match a voice ID from the /v1/voices endpoint.
Whether to allow recording of the conversation session. Set to false to disable recording.
Getting Available Voices
Before configuring a session, retrieve the list of available voices:
HTTP Endpoint
Response
[
{
"name" : "default" ,
"language" : "en" ,
"gender" : "neutral"
},
{
"name" : "voice_001" ,
"language" : "en" ,
"gender" : "female"
},
{
"name" : "voice_002" ,
"language" : "en" ,
"gender" : "male"
}
]
Note : Only voices with good: true are returned. The comment field is excluded from the response.
JavaScript Example
const response = await fetch ( 'http://localhost:8000/v1/voices' );
const voices = await response . json ();
console . log ( 'Available voices:' , voices );
Configuring a Session
Send a session.update event immediately after establishing the WebSocket connection.
Example: Basic Configuration
{
"type" : "session.update" ,
"session" : {
"voice" : "default" ,
"allow_recording" : false
}
}
Example: With Instructions
{
"type" : "session.update" ,
"session" : {
"instructions" : {
"character" : "You are a friendly and helpful AI assistant with a warm personality." ,
"scenario" : "You are helping a user learn about voice AI technology."
},
"voice" : "voice_001" ,
"allow_recording" : false
}
}
Server Confirmation
The server responds with a session.updated event:
{
"type" : "session.updated" ,
"event_id" : "event_ABC123xyz" ,
"session" : {
"instructions" : {
"character" : "You are a friendly and helpful AI assistant with a warm personality." ,
"scenario" : "You are helping a user learn about voice AI technology."
},
"voice" : "voice_001" ,
"allow_recording" : false
}
}
Instructions Object
The instructions field is an Unmute extension to the OpenAI Realtime API. It provides structured guidance to the language model.
Character
Defines the AI assistant’s personality, tone, and behavioral characteristics.
Examples :
"You are a professional medical assistant."
"You are an enthusiastic teacher who loves explaining complex topics."
"You are a concise technical support agent."
Scenario
Provides context about the conversation setting or purpose.
Examples :
"You are helping a user troubleshoot their software issue."
"The user is practicing English conversation."
"You are conducting a job interview simulation."
Updating Session Mid-Conversation
You can send additional session.update events at any time to change the configuration:
// Change voice mid-conversation
ws . send ( JSON . stringify ({
type: 'session.update' ,
session: {
voice: 'voice_002' ,
allow_recording: false
}
}));
The server will apply the new configuration and respond with session.updated.
Voice Cloning
Unmute supports custom voice cloning. Upload audio samples to create new voices.
Upload Voice Sample
POST /v1/voices
Content-Type: multipart/form-data
Parameters :
file: Audio file (maximum size: configurable via MAX_VOICE_FILE_SIZE_MB)
Response :
{
"name" : "cloned_voice_ABC123"
}
Using Cloned Voice
Once uploaded, use the returned voice name in your session configuration:
{
"type" : "session.update" ,
"session" : {
"voice" : "cloned_voice_ABC123" ,
"allow_recording" : false
}
}
JavaScript Example
// Upload audio file for voice cloning
const formData = new FormData ();
formData . append ( 'file' , audioFile );
const response = await fetch ( 'http://localhost:8000/v1/voices' , {
method: 'POST' ,
body: formData
});
const { name } = await response . json ();
// Use the cloned voice
ws . send ( JSON . stringify ({
type: 'session.update' ,
session: {
voice: name ,
allow_recording: false
}
}));
Recording Management
The allow_recording parameter controls whether conversations are recorded for later analysis.
Disabling Recording
{
"type" : "session.update" ,
"session" : {
"voice" : "default" ,
"allow_recording" : false
}
}
Enabling Recording
{
"type" : "session.update" ,
"session" : {
"voice" : "default" ,
"allow_recording" : true
}
}
Note : When recording is enabled, the server stores:
Client events (with audio anonymized as sample counts)
Server events (responses, transcriptions, etc.)
Conversation metadata
Complete Session Setup Example
// 1. Check server health
const health = await fetch ( 'http://localhost:8000/v1/health' ). then ( r => r . json ());
if ( ! health . ok ) {
throw new Error ( 'Server is not healthy' );
}
// 2. Get available voices
const voices = await fetch ( 'http://localhost:8000/v1/voices' ). then ( r => r . json ());
// 3. Connect to WebSocket
const ws = new WebSocket ( 'ws://localhost:8000/v1/realtime' , 'realtime' );
ws . onopen = () => {
// 4. Configure session
ws . send ( JSON . stringify ({
type: 'session.update' ,
session: {
instructions: {
character: 'You are a helpful AI assistant.' ,
scenario: 'General conversation'
},
voice: voices [ 0 ]. name ,
allow_recording: false
}
}));
};
ws . onmessage = ( event ) => {
const message = JSON . parse ( event . data );
if ( message . type === 'session.updated' ) {
console . log ( 'Session configured successfully' );
// 5. Start sending audio
}
};
Error Handling
If session configuration fails, the server sends an error event:
{
"type" : "error" ,
"error" : {
"type" : "invalid_request_error" ,
"message" : "Invalid message" ,
"details" : [
{
"type" : "missing" ,
"loc" : [ "session" ],
"msg" : "Field required"
}
]
}
}
Next Steps
Client Events Send audio and commands to the server
Server Events Receive responses and updates from the server
WebSocket Overview Learn about the WebSocket protocol