Skip to main content

Overview

The Realtime API enables low-latency, multi-turn conversations with AI models over WebSocket connections. This is ideal for voice assistants, interactive applications, and real-time chat experiences.
The Realtime API is currently supported on Cloudflare Workers runtime. For Node.js, use the dedicated realtime handler.

Connection

WebSocket Endpoint

wss://your-gateway.com/v1/realtime

Authentication

Pass authentication as query parameters:
wss://your-gateway.com/v1/realtime?provider=openai&apiKey=YOUR_API_KEY&model=gpt-4o-realtime-preview

Query Parameters

provider
string
required
The provider to use (e.g., openai)
apiKey
string
required
Your provider API key
model
string
The model to use (default: gpt-4o-realtime-preview)

Event Types

Client Events

Events sent from your application to the model:
session.update
object
Update session configuration
input_audio_buffer.append
object
Add audio data to the input buffer
input_audio_buffer.commit
object
Commit the audio buffer for processing
conversation.item.create
object
Add a message to the conversation
response.create
object
Trigger a model response
response.cancel
object
Cancel an in-progress response

Server Events

Events sent from the model to your application:
session.created
object
Session was successfully created
session.updated
object
Session configuration was updated
conversation.item.created
object
A new conversation item was created
response.audio.delta
object
Audio response chunk
response.audio.done
object
Audio response completed
response.text.delta
object
Text response chunk
response.text.done
object
Text response completed
response.done
object
Response generation completed
error
object
An error occurred

Example

Basic Text Conversation

const ws = new WebSocket(
    'wss://localhost:8787/v1/realtime?' +
    'provider=openai&' +
    'apiKey=YOUR_API_KEY&' +
    'model=gpt-4o-realtime-preview'
);

ws.onopen = () => {
    console.log('Connected to realtime API');
    
    // Configure session
    ws.send(JSON.stringify({
        type: 'session.update',
        session: {
            modalities: ['text'],
            instructions: 'You are a helpful assistant.',
            temperature: 0.8
        }
    }));
    
    // Create a conversation item
    ws.send(JSON.stringify({
        type: 'conversation.item.create',
        item: {
            type: 'message',
            role: 'user',
            content: [{
                type: 'input_text',
                text: 'Hello! How are you?'
            }]
        }
    }));
    
    // Trigger response
    ws.send(JSON.stringify({
        type: 'response.create'
    }));
};

ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    
    if (data.type === 'response.text.delta') {
        process.stdout.write(data.delta);
    } else if (data.type === 'response.done') {
        console.log('\nResponse complete');
    } else if (data.type === 'error') {
        console.error('Error:', data.error);
    }
};

ws.onerror = (error) => {
    console.error('WebSocket error:', error);
};

ws.onclose = () => {
    console.log('Connection closed');
};

Audio Streaming

const ws = new WebSocket(
    'wss://localhost:8787/v1/realtime?provider=openai&apiKey=YOUR_API_KEY'
);

ws.onopen = () => {
    // Configure for audio
    ws.send(JSON.stringify({
        type: 'session.update',
        session: {
            modalities: ['text', 'audio'],
            voice: 'alloy',
            input_audio_format: 'pcm16',
            output_audio_format: 'pcm16'
        }
    }));
    
    // Stream audio from microphone
    navigator.mediaDevices.getUserMedia({ audio: true })
        .then(stream => {
            const audioContext = new AudioContext({ sampleRate: 24000 });
            const source = audioContext.createMediaStreamSource(stream);
            const processor = audioContext.createScriptProcessor(4096, 1, 1);
            
            processor.onaudioprocess = (e) => {
                const audioData = e.inputBuffer.getChannelData(0);
                const pcm16 = convertFloat32ToPCM16(audioData);
                const base64Audio = btoa(String.fromCharCode(...pcm16));
                
                ws.send(JSON.stringify({
                    type: 'input_audio_buffer.append',
                    audio: base64Audio
                }));
            };
            
            source.connect(processor);
            processor.connect(audioContext.destination);
        });
};

ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    
    if (data.type === 'response.audio.delta') {
        // Play audio chunk
        const audioData = atob(data.delta);
        playAudioChunk(audioData);
    }
};

function convertFloat32ToPCM16(float32Array) {
    const pcm16 = new Int16Array(float32Array.length);
    for (let i = 0; i < float32Array.length; i++) {
        const s = Math.max(-1, Math.min(1, float32Array[i]));
        pcm16[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
    }
    return new Uint8Array(pcm16.buffer);
}

Best Practices

  • Use PCM16 format at 24kHz sample rate for best compatibility
  • Keep audio chunks around 100ms (2400 samples) for optimal latency
  • Buffer audio on the client side to handle network jitter
  • Implement reconnection logic with exponential backoff
  • Monitor connection health with ping/pong frames
  • Close connections gracefully when done
  • Always handle error events from the server
  • Implement timeout logic for responses
  • Provide fallback behavior for connection failures
  • Use audio compression where appropriate
  • Implement voice activity detection to reduce unnecessary data
  • Cache session configuration to avoid repeated updates

Supported Providers

Realtime API support:
  • OpenAI: Full support with gpt-4o-realtime-preview
  • Azure OpenAI: Supported on compatible deployments
Check provider documentation for model availability and pricing.

Chat Completions

Standard chat API

Audio Speech

Text-to-speech API

Streaming

HTTP streaming guide

Build docs developers (and LLMs) love