Realtime WebSocket API

Overview

The Realtime API enables low-latency, multi-turn conversations with AI models over WebSocket connections. This is ideal for voice assistants, interactive applications, and real-time chat experiences.

The Realtime API is currently supported on Cloudflare Workers runtime. For Node.js, use the dedicated realtime handler.

Connection

WebSocket Endpoint

wss://your-gateway.com/v1/realtime

Authentication

Pass authentication as query parameters:

wss://your-gateway.com/v1/realtime?provider=openai&apiKey=YOUR_API_KEY&model=gpt-4o-realtime-preview

Query Parameters

provider

string

required

The provider to use (e.g., openai)

apiKey

string

required

Your provider API key

model

string

The model to use (default: gpt-4o-realtime-preview)

Event Types

Client Events

Events sent from your application to the model:

session.update

object

Update session configuration

Show properties

modalities

array

Supported modalities: ["text", "audio"]

instructions

string

System instructions for the model

voice

string

Voice to use: alloy, echo, shimmer

temperature

number

Sampling temperature (0.0 - 1.0)

input_audio_buffer.append

object

Add audio data to the input buffer

Show properties

audio

string

Base64-encoded audio data (PCM16, 24kHz, mono)

input_audio_buffer.commit

object

Commit the audio buffer for processing

conversation.item.create

object

Add a message to the conversation

Show properties

item

object

The conversation item (message or function call)

response.create

object

Trigger a model response

Show properties

response

object

Optional response configuration

response.cancel

object

Cancel an in-progress response

Server Events

Events sent from the model to your application:

session.created

object

Session was successfully created

session.updated

object

Session configuration was updated

conversation.item.created

object

A new conversation item was created

response.audio.delta

object

Audio response chunk

Show properties

delta

string

Base64-encoded audio data

response.audio.done

object

Audio response completed

response.text.delta

object

Text response chunk

Show properties

delta

string

Text content

response.text.done

object

Text response completed

response.done

object

Response generation completed

error

object

An error occurred

Show properties

error

object

Error details

Example

Basic Text Conversation

const ws = new WebSocket(
    'wss://localhost:8787/v1/realtime?' +
    'provider=openai&' +
    'apiKey=YOUR_API_KEY&' +
    'model=gpt-4o-realtime-preview'
);

ws.onopen = () => {
    console.log('Connected to realtime API');
    
    // Configure session
    ws.send(JSON.stringify({
        type: 'session.update',
        session: {
            modalities: ['text'],
            instructions: 'You are a helpful assistant.',
            temperature: 0.8
        }
    }));
    
    // Create a conversation item
    ws.send(JSON.stringify({
        type: 'conversation.item.create',
        item: {
            type: 'message',
            role: 'user',
            content: [{
                type: 'input_text',
                text: 'Hello! How are you?'
            }]
        }
    }));
    
    // Trigger response
    ws.send(JSON.stringify({
        type: 'response.create'
    }));
};

ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    
    if (data.type === 'response.text.delta') {
        process.stdout.write(data.delta);
    } else if (data.type === 'response.done') {
        console.log('\nResponse complete');
    } else if (data.type === 'error') {
        console.error('Error:', data.error);
    }
};

ws.onerror = (error) => {
    console.error('WebSocket error:', error);
};

ws.onclose = () => {
    console.log('Connection closed');
};

Audio Streaming

const ws = new WebSocket(
    'wss://localhost:8787/v1/realtime?provider=openai&apiKey=YOUR_API_KEY'
);

ws.onopen = () => {
    // Configure for audio
    ws.send(JSON.stringify({
        type: 'session.update',
        session: {
            modalities: ['text', 'audio'],
            voice: 'alloy',
            input_audio_format: 'pcm16',
            output_audio_format: 'pcm16'
        }
    }));
    
    // Stream audio from microphone
    navigator.mediaDevices.getUserMedia({ audio: true })
        .then(stream => {
            const audioContext = new AudioContext({ sampleRate: 24000 });
            const source = audioContext.createMediaStreamSource(stream);
            const processor = audioContext.createScriptProcessor(4096, 1, 1);
            
            processor.onaudioprocess = (e) => {
                const audioData = e.inputBuffer.getChannelData(0);
                const pcm16 = convertFloat32ToPCM16(audioData);
                const base64Audio = btoa(String.fromCharCode(...pcm16));
                
                ws.send(JSON.stringify({
                    type: 'input_audio_buffer.append',
                    audio: base64Audio
                }));
            };
            
            source.connect(processor);
            processor.connect(audioContext.destination);
        });
};

ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    
    if (data.type === 'response.audio.delta') {
        // Play audio chunk
        const audioData = atob(data.delta);
        playAudioChunk(audioData);
    }
};

function convertFloat32ToPCM16(float32Array) {
    const pcm16 = new Int16Array(float32Array.length);
    for (let i = 0; i < float32Array.length; i++) {
        const s = Math.max(-1, Math.min(1, float32Array[i]));
        pcm16[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
    }
    return new Uint8Array(pcm16.buffer);
}

Best Practices

Audio Format

Use PCM16 format at 24kHz sample rate for best compatibility
Keep audio chunks around 100ms (2400 samples) for optimal latency
Buffer audio on the client side to handle network jitter

Connection Management

Implement reconnection logic with exponential backoff
Monitor connection health with ping/pong frames
Close connections gracefully when done

Error Handling

Always handle error events from the server
Implement timeout logic for responses
Provide fallback behavior for connection failures

Performance

Use audio compression where appropriate
Implement voice activity detection to reduce unnecessary data
Cache session configuration to avoid repeated updates

Supported Providers

Realtime API support:

OpenAI: Full support with gpt-4o-realtime-preview
Azure OpenAI: Supported on compatible deployments

Check provider documentation for model availability and pricing.

Chat Completions

Standard chat API

Audio Speech

Text-to-speech API

Streaming

HTTP streaming guide

Overview

Models

Messages

Chat

Completions

Embeddings

Images

Audio

Files

Batches

Fine-tuning

Realtime

Overview

Connection

WebSocket Endpoint

Authentication

Query Parameters

Event Types

Client Events

Server Events

Example

Basic Text Conversation

Audio Streaming

Best Practices

Supported Providers

Chat Completions

Audio Speech

Streaming

Build docs developers (and LLMs) love

Overview

Models

Messages

Chat

Completions

Embeddings

Images

Audio

Files

Batches

Fine-tuning

Realtime

​Overview

​Connection

​WebSocket Endpoint

​Authentication

​Query Parameters

​Event Types

​Client Events

​Server Events

​Example

​Basic Text Conversation

​Audio Streaming

​Best Practices

​Supported Providers

​Related Resources

Chat Completions

Audio Speech

Streaming

Build docs developers (and LLMs) love

Overview

Connection

WebSocket Endpoint

Authentication

Query Parameters

Event Types

Client Events

Server Events

Example

Basic Text Conversation

Audio Streaming

Best Practices

Supported Providers

Related Resources