Overview
The Realtime API enables low-latency, multi-modal conversational experiences using WebSocket connections. It supports text and audio as both input and output, as well as function calling. Key benefits:- Native speech-to-speech: Low latency by skipping intermediate text format
- Natural voices: Models can laugh, whisper, and follow tone directions
- Simultaneous multimodal output: Get both text and audio in real-time
Connection Setup
The Realtime API is a stateful, event-based API that communicates over WebSocket.WebSocket Connection
Connection Parameters
The Realtime model to use. Required for Azure, optional for OpenAI.Examples:
gpt-4o-realtime-preview, gpt-4o-realtime-preview-2024-10-01Optional call identifier for tracking purposes
Additional query parameters for the WebSocket connection
Additional headers for the WebSocket connection
WebSocket-specific connection options
Session Management
Update Session Configuration
Update session settings at any time during the connection.Session Configuration Options
System instructions for the model (e.g., “Be succinct”, “Speak quickly”)
Voice for audio output. Options:
alloy, echo, shimmerCan only be updated before any audio output has been generated.Format for input audio:
pcm16 or g711_ulaw or g711_alawFormat for output audio:
pcm16 or g711_ulaw or g711_alawVoice Activity Detection (VAD) configuration:
type:"server_vad"ornullto disablethreshold: Detection sensitivity (0-1)prefix_padding_ms: Audio before speech startssilence_duration_ms: Silence duration to end turn
Function tools available to the model
How model chooses tools:
auto, none, required, or force a specific functionSampling temperature (0-2). Higher = more random.
Maximum tokens per response (1-4096 or
"inf")Receiving Events
Iterate Through Events
Receive Single Event
Complete Example
Async Usage
Notes
- Installation: Requires
openai[realtime]package:pip install openai[realtime] - Context manager: Connection is automatically closed when exiting the
withblock - Manual connection: Use
.enter()method if you need to manage connection lifecycle manually - Azure: Model parameter is required for Azure Realtime API
- Session configuration can be updated anytime except
voiceandmodel