Realtime APIs (WebSocket) - Portkey AI Gateway

Overview

Realtime APIs enable bidirectional, low-latency communication with LLMs over WebSocket connections. This powers use cases like:

Voice assistants with real-time transcription and responses
Interactive chat with streaming function calls
Live translation and interpretation
Real-time audio processing

The Gateway provides a WebSocket server that proxies connections to provider realtime endpoints (currently OpenAI’s Realtime API).

Realtime APIs are different from HTTP streaming. They use WebSocket for full-duplex communication, allowing you to send and receive messages simultaneously.

How It Works

Client establishes WebSocket connection to Gateway
Gateway creates outgoing WebSocket connection to provider
Messages are proxied bidirectionally with observability
Gateway tracks events, tokens, and costs in real-time
Connection closes when either side disconnects

Client <--> Gateway <--> Provider (OpenAI)
         (WebSocket)    (WebSocket)

Supported Providers

Currently supported:

OpenAI Realtime API (gpt-4o-realtime-preview, gpt-4o-mini-realtime-preview)

Getting Started

WebSocket Connection

Connect to the Gateway’s realtime endpoint:

wss://api.portkey.ai/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01

Authentication

Include Portkey headers in the WebSocket upgrade request:

const ws = new WebSocket('wss://api.portkey.ai/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01', {
  headers: {
    'x-portkey-api-key': 'PORTKEY_API_KEY',
    'x-portkey-provider': 'openai',
    'Authorization': 'Bearer OPENAI_API_KEY'
  }
});

You can also use Virtual Keys instead of passing the OpenAI API key directly:

'x-portkey-virtual-key': 'openai-virtual-key-xyz'

Usage Examples

const ws = new WebSocket(
  'wss://api.portkey.ai/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01',
  [],
  {
    headers: {
      'x-portkey-api-key': 'PORTKEY_API_KEY',
      'x-portkey-provider': 'openai',
      'Authorization': 'Bearer OPENAI_API_KEY'
    }
  }
);

// Connection opened
ws.addEventListener('open', (event) => {
  console.log('Connected to Portkey Gateway');
  
  // Send a message
  ws.send(JSON.stringify({
    type: 'conversation.item.create',
    item: {
      type: 'message',
      role: 'user',
      content: [{
        type: 'input_text',
        text: 'Hello, how are you?'
      }]
    }
  }));
  
  // Request response
  ws.send(JSON.stringify({
    type: 'response.create'
  }));
});

// Listen for messages
ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  console.log('Received:', data);
  
  if (data.type === 'response.text.delta') {
    process.stdout.write(data.delta);
  }
  
  if (data.type === 'response.done') {
    console.log('\nResponse complete');
  }
});

// Handle errors
ws.addEventListener('error', (error) => {
  console.error('WebSocket error:', error);
});

// Handle close
ws.addEventListener('close', (event) => {
  console.log('Disconnected:', event.code, event.reason);
});

OpenAI Realtime API Events

Client Events (Send to Gateway)

Create Conversation Item

{
  "type": "conversation.item.create",
  "item": {
    "type": "message",
    "role": "user",
    "content": [{
      "type": "input_text",
      "text": "Hello!"
    }]
  }
}

Request Response

{
  "type": "response.create",
  "response": {
    "modalities": ["text", "audio"],
    "instructions": "You are a helpful assistant."
  }
}

Update Session

{
  "type": "session.update",
  "session": {
    "modalities": ["text", "audio"],
    "voice": "alloy",
    "temperature": 0.8
  }
}

Server Events (Receive from Gateway)

Session Created

{
  "type": "session.created",
  "session": {
    "id": "sess_123",
    "model": "gpt-4o-realtime-preview-2024-10-01",
    "modalities": ["text", "audio"]
  }
}

Response Text Delta

{
  "type": "response.text.delta",
  "delta": "Hello",
  "response_id": "resp_123",
  "item_id": "item_456"
}

Response Audio Delta

{
  "type": "response.audio.delta",
  "delta": "base64_audio_chunk",
  "response_id": "resp_123",
  "item_id": "item_456"
}

Response Done

{
  "type": "response.done",
  "response": {
    "id": "resp_123",
    "status": "completed",
    "output": [...]
  }
}

Audio Streaming

Send Audio Input

// Create audio conversation item
ws.send(JSON.stringify({
  type: 'conversation.item.create',
  item: {
    type: 'message',
    role: 'user',
    content: [{
      type: 'input_audio',
      audio: base64AudioData  // Base64 encoded PCM16 audio
    }]
  }
}));

// Request response
ws.send(JSON.stringify({
  type: 'response.create',
  response: {
    modalities: ['text', 'audio']
  }
}));

Receive Audio Output

ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  
  if (data.type === 'response.audio.delta') {
    // data.delta contains base64 encoded PCM16 audio
    const audioChunk = Buffer.from(data.delta, 'base64');
    playAudio(audioChunk);
  }
});

Implementation Details

Gateway WebSocket Handler

From src/handlers/realtimeHandler.ts:

export async function realTimeHandler(c: Context): Promise<Response> {
  try {
    const requestHeaders = Object.fromEntries(c.req.raw.headers);
    const providerOptions = constructConfigFromRequestHeaders(requestHeaders);
    const provider = providerOptions.provider ?? '';
    const apiConfig: ProviderAPIConfig = Providers[provider].api;
    
    // Get provider URL and options
    const url = getURLForOutgoingConnection(apiConfig, providerOptions, c.req.url, c);
    const options = await getOptionsForOutgoingConnection(apiConfig, providerOptions, url, c);

    const sessionOptions = {
      id: crypto.randomUUID(),
      providerOptions: {
        ...providerOptions,
        requestURL: url,
        rubeusURL: 'realtime',
      },
      requestHeaders,
      requestParams: {},
    };

    // Create WebSocket pair
    const webSocketPair = new WebSocketPair();
    const client = webSocketPair[0];
    const server = webSocketPair[1];

    server.accept();

    // Connect to provider
    let outgoingWebSocket: WebSocket = await getOutgoingWebSocket(url, options);
    const eventParser = new RealtimeLlmEventParser();
    addListeners(outgoingWebSocket, eventParser, server, c, sessionOptions);

    return new Response(null, {
      status: 101,
      webSocket: client,
    });
  } catch (err: any) {
    console.error('realtimeHandler error: ', err.message);
    return new Response(
      JSON.stringify({
        status: 'failure',
        message: 'Something went wrong',
      }),
      { status: 500 }
    );
  }
}

Event Parsing and Observability

The Gateway parses WebSocket events to track:

Token usage (input/output)
Cost calculation
Response latency
Error rates
Custom metadata

class RealtimeLlmEventParser {
  parseEvent(event: any) {
    // Extract tokens, costs, and metadata
    // Track in observability system
  }
}

Advanced Patterns

Function Calling

// Define functions
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    tools: [
      {
        type: 'function',
        name: 'get_weather',
        description: 'Get weather for a location',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string' }
          },
          required: ['location']
        }
      }
    ]
  }
}));

// Handle function calls
ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  
  if (data.type === 'response.function_call_arguments.done') {
    const functionName = data.name;
    const args = JSON.parse(data.arguments);
    
    // Execute function
    const result = executeFunction(functionName, args);
    
    // Send result back
    ws.send(JSON.stringify({
      type: 'conversation.item.create',
      item: {
        type: 'function_call_output',
        call_id: data.call_id,
        output: JSON.stringify(result)
      }
    }));
  }
});

Multi-Turn Conversation

const conversation = [];

function addMessage(role, content) {
  conversation.push({ role, content });
  
  ws.send(JSON.stringify({
    type: 'conversation.item.create',
    item: {
      type: 'message',
      role: role,
      content: [{ type: 'input_text', text: content }]
    }
  }));
}

function requestResponse() {
  ws.send(JSON.stringify({
    type: 'response.create',
    response: {
      conversation: conversation
    }
  }));
}

// Usage
addMessage('user', 'What is the capital of France?');
requestResponse();

// Later
addMessage('user', 'What is its population?');
requestResponse();

Voice Assistant

// Configure for voice
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    modalities: ['text', 'audio'],
    voice: 'alloy',
    input_audio_format: 'pcm16',
    output_audio_format: 'pcm16',
    turn_detection: {
      type: 'server_vad',
      threshold: 0.5,
      prefix_padding_ms: 300,
      silence_duration_ms: 500
    }
  }
}));

// Stream audio from microphone
microphone.on('data', (audioChunk) => {
  ws.send(JSON.stringify({
    type: 'input_audio_buffer.append',
    audio: audioChunk.toString('base64')
  }));
});

// Play audio responses
ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  
  if (data.type === 'response.audio.delta') {
    const audioChunk = Buffer.from(data.delta, 'base64');
    speaker.write(audioChunk);
  }
});

Configuration Options

Session Configuration

{
  "type": "session.update",
  "session": {
    "modalities": ["text", "audio"],
    "voice": "alloy",
    "instructions": "You are a helpful assistant.",
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm16",
    "input_audio_transcription": {
      "model": "whisper-1"
    },
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5
    },
    "temperature": 0.8,
    "max_response_output_tokens": 1000
  }
}

Voice Options

alloy
echo
fable
onyx
nova
shimmer

Audio Formats

pcm16 - 16-bit PCM audio at 24kHz
g711_ulaw - G.711 μ-law audio at 8kHz
g711_alaw - G.711 A-law audio at 8kHz

Error Handling

ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  
  if (data.type === 'error') {
    console.error('Realtime API error:', data.error);
    
    switch (data.error.code) {
      case 'rate_limit_exceeded':
        // Handle rate limit
        break;
      case 'invalid_request':
        // Handle invalid request
        break;
      default:
        // Handle other errors
    }
  }
});

ws.addEventListener('close', (event) => {
  if (event.code !== 1000) {
    console.error('Abnormal close:', event.code, event.reason);
    // Implement reconnection logic
  }
});

Best Practices

Implement Reconnection Logic

WebSocket connections can drop. Implement exponential backoff reconnection:

function connectWithRetry(retries = 5, delay = 1000) {
  const ws = new WebSocket(url, options);
  
  ws.onerror = () => {
    if (retries > 0) {
      setTimeout(() => connectWithRetry(retries - 1, delay * 2), delay);
    }
  };
  
  return ws;
}

Handle Audio Buffering

Buffer audio chunks to prevent choppy playback and handle network jitter appropriately.

Monitor Connection Health

Implement ping/pong or heartbeat to detect stale connections:

setInterval(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: 'ping' }));
  }
}, 30000);

Cleanup Resources

Always cleanup when done:

ws.close(1000, 'Normal closure');
microphone.stop();
speaker.close();

Use Virtual Keys

Use Portkey Virtual Keys instead of hardcoding API keys for better security and management.

Streaming

HTTP streaming responses

Multi-Modal

Audio and vision capabilities

Timeouts

Configure connection timeouts

Observability

Monitor realtime API usage

Getting Started

Core Concepts

Features

MCP Gateway

Deployment

​Overview

​How It Works

​Supported Providers

​Getting Started

​WebSocket Connection

​Authentication

​Usage Examples

​OpenAI Realtime API Events

​Client Events (Send to Gateway)

​Create Conversation Item

​Request Response

​Update Session

​Server Events (Receive from Gateway)

​Session Created

​Response Text Delta

​Response Audio Delta

​Response Done

​Audio Streaming

​Send Audio Input

​Receive Audio Output

​Implementation Details

​Gateway WebSocket Handler

​Event Parsing and Observability

​Advanced Patterns

​Function Calling

​Multi-Turn Conversation

​Voice Assistant

​Configuration Options

​Session Configuration

​Voice Options

​Audio Formats

​Error Handling

​Best Practices

​Related Features

Streaming

Multi-Modal

Timeouts

Observability

Build docs developers (and LLMs) love

Overview

How It Works

Supported Providers

Getting Started

WebSocket Connection

Authentication

Usage Examples

OpenAI Realtime API Events

Client Events (Send to Gateway)

Create Conversation Item

Request Response

Update Session

Server Events (Receive from Gateway)

Session Created

Response Text Delta

Response Audio Delta

Response Done

Audio Streaming

Send Audio Input

Receive Audio Output

Implementation Details

Gateway WebSocket Handler

Event Parsing and Observability

Advanced Patterns

Function Calling

Multi-Turn Conversation

Voice Assistant

Configuration Options

Session Configuration

Voice Options

Audio Formats

Error Handling

Best Practices

Related Features