The voice runtime is the core engine that powers NAVAI’s voice-first capabilities. It manages the connection to OpenAI’s Realtime API, handles bidirectional audio streaming, and coordinates tool execution.
How the runtime works
The voice runtime establishes a WebRTC connection between your application and OpenAI’s Realtime API, enabling low-latency voice interaction.
Initialize runtime configuration
Load routes, functions, and environment settings to configure the voice agent.
Request client secret
Your app calls your backend to generate an ephemeral OpenAI client secret.
Build agent with tools
Create a RealtimeAgent with navigation and function execution tools.
Establish WebRTC connection
Connect to OpenAI using the client secret and start audio streaming.
Handle voice interaction
Process audio input, execute tools, and stream audio responses back to the user.
Web runtime
The web runtime uses the @openai/agents SDK with React hooks for state management.
Starting a voice session
The useWebVoiceAgent hook manages the complete session lifecycle:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:104-149
const start = useCallback ( async () : Promise < void > => {
if ( status === "connecting" || status === "connected" ) {
return ;
}
setError ( null );
setStatus ( "connecting" );
try {
// 1. Load runtime configuration (routes + functions)
const runtimeConfig = await runtimeConfigPromise ;
// 2. Request client secret from backend
const requestPayload = runtimeConfig . modelOverride ? { model: runtimeConfig . modelOverride } : {};
const secretPayload = await backendClient . createClientSecret ( requestPayload );
// 3. Load backend functions list
const backendFunctionsResult = await backendClient . listFunctions ();
// 4. Build agent with tools
const { agent , warnings } = await buildNavaiAgent ({
navigate: options . navigate ,
routes: runtimeConfig . routes ,
functionModuleLoaders: runtimeConfig . functionModuleLoaders ,
backendFunctions: backendFunctionsResult . functions ,
executeBackendFunction: backendClient . executeFunction
});
emitWarnings ([ ... runtimeConfig . warnings , ... backendFunctionsResult . warnings , ... warnings ]);
// 5. Create and connect session
const session = new RealtimeSession ( agent );
if ( runtimeConfig . modelOverride ) {
await session . connect ({ apiKey: secretPayload . value , model: runtimeConfig . modelOverride });
} else {
await session . connect ({ apiKey: secretPayload . value });
}
sessionRef . current = session ;
setStatus ( "connected" );
} catch ( startError ) {
const message = formatError ( startError );
setError ( message );
setStatus ( "error" );
try {
sessionRef . current ?. close ();
} catch {
// ignore close errors during bootstrap
}
sessionRef . current = null ;
}
}, [ backendClient , options . navigate , runtimeConfigPromise , status ]);
Session states
The runtime tracks connection state through a simple state machine:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:10
type VoiceStatus = "idle" | "connecting" | "connected" | "error" ;
idle
connecting
connected
error
Initial state before any connection attempt. The agent is not running and no resources are allocated.
Actively requesting client secret, building agent, and establishing WebRTC connection.
WebRTC connection established successfully. Audio streaming and voice interaction are active.
Connection failed or was interrupted. Check the error property for details.
Stopping the session
Cleanly close the connection and release resources:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:89-96
const stop = useCallback (() => {
try {
sessionRef . current ?. close ();
} finally {
sessionRef . current = null ;
setStatus ( "idle" );
}
}, []);
Always call stop() when unmounting your component or when the user navigates away to properly close the WebRTC connection and prevent memory leaks.
Mobile runtime
The mobile runtime provides additional flexibility through pluggable transport layers to support different WebRTC implementations.
Transport abstraction
Mobile apps use a transport interface that abstracts WebRTC implementation details:
// From packages/voice-mobile/src/transport.ts
export type NavaiRealtimeTransport = {
connect : ( options : NavaiRealtimeTransportConnectOptions ) => Promise < void >;
disconnect : () => void ;
state : NavaiRealtimeTransportState ;
onAudioData ?: ( data : ArrayBuffer ) => void ;
onMessage ?: ( message : unknown ) => void ;
onError ?: ( error : Error ) => void ;
sendAudio : ( data : ArrayBuffer ) => void ;
sendMessage : ( message : unknown ) => void ;
};
React Native WebRTC transport
NAVAI provides a pre-built transport for react-native-webrtc:
// From packages/voice-mobile/src/react-native-webrtc.ts
export function createReactNativeWebRtcTransport (
options : CreateReactNativeWebRtcTransportOptions
) : NavaiRealtimeTransport {
// Create peer connection
const peerConnection = new RTCPeerConnection ({
iceServers: options . iceServers ,
});
// Setup audio tracks
// Setup data channels
// Handle connection lifecycle
return transport ;
}
You can implement your own transport to use different WebRTC libraries or custom audio streaming solutions.
Mobile session lifecycle
The mobile runtime uses createNavaiMobileVoiceSession to manage sessions:
// From packages/voice-mobile/src/session.ts
export function createNavaiMobileVoiceSession (
options : CreateNavaiMobileVoiceSessionOptions
) : NavaiMobileVoiceSession {
const session = {
async start ( input : StartNavaiMobileVoiceSessionInput ) {
// 1. Request client secret
// 2. Create agent runtime
// 3. Connect transport
// 4. Start audio streaming
},
stop () {
// Disconnect transport and cleanup
},
getSnapshot () : NavaiMobileVoiceSessionSnapshot {
return {
state: currentState ,
error: currentError
};
}
};
return session ;
}
OpenAI Realtime API connection
The runtime connects to OpenAI’s Realtime API using ephemeral client secrets generated by your backend.
Client secret lifecycle
Backend requests client secret
Your server calls OpenAI’s /v1/realtime/client_secrets endpoint with your API key: // From packages/voice-backend/src/index.ts:21
const OPENAI_CLIENT_SECRETS_URL = "https://api.openai.com/v1/realtime/client_secrets" ;
OpenAI returns ephemeral credential
OpenAI responds with a time-limited client secret: // From packages/voice-backend/src/index.ts:52-56
export type OpenAIRealtimeClientSecretResponse = {
value : string ;
expires_at : number ;
session ?: unknown ;
};
Client uses secret to connect
Your frontend or mobile app uses this secret as a temporary API key to establish a WebRTC connection.
Secret expires automatically
After the TTL expires (default 600 seconds), the secret becomes invalid and cannot be reused.
Configuring session parameters
You can customize the AI model, voice, and instructions when requesting a client secret:
// From packages/voice-backend/src/index.ts:42-50
export type CreateClientSecretRequest = {
model ?: string ;
voice ?: string ;
instructions ?: string ;
language ?: string ;
voiceAccent ?: string ;
voiceTone ?: string ;
apiKey ?: string ;
};
OpenAI provides several voice options:
alloy : Neutral and balanced
echo : Warm and upbeat
fable : British accent, expressive
onyx : Deep and authoritative
nova : Energetic and friendly
shimmer : Soft and gentle
marin : Natural and conversational (default in NAVAI)
Set the voice in your backend options:
Session instructions
The backend builds session instructions by combining base instructions with language, accent, and tone preferences:
// From packages/voice-backend/src/index.ts:134-158
function buildSessionInstructions ( input : {
baseInstructions : string ;
language ?: string ;
voiceAccent ?: string ;
voiceTone ?: string ;
}) : string {
const lines = [ input . baseInstructions . trim ()];
const language = readOptional ( input . language );
const voiceAccent = readOptional ( input . voiceAccent );
const voiceTone = readOptional ( input . voiceTone );
if ( language ) {
lines . push ( `Always reply in ${ language } .` );
}
if ( voiceAccent ) {
lines . push ( `Use a ${ voiceAccent } accent while speaking.` );
}
if ( voiceTone ) {
lines . push ( `Use a ${ voiceTone } tone while speaking.` );
}
return lines . join ( " \n " );
}
Audio streaming and processing
The runtime handles bidirectional audio streaming between your app and OpenAI.
Web audio streaming
In web apps, the @openai/agents SDK handles audio capture and playback using the Web Audio API:
// Browser automatically handles:
// - Microphone access via getUserMedia
// - Audio encoding for WebRTC
// - Audio decoding and playback
// - Echo cancellation and noise suppression
Mobile audio streaming
Mobile apps require platform-specific audio handling:
Audio capture
Audio transmission
Audio playback
Capture microphone input using React Native modules: // Using react-native-webrtc
const localStream = await mediaDevices . getUserMedia ({
audio: true ,
video: false
});
Send audio data through the WebRTC data channel: transport . sendAudio ( audioBuffer );
Receive and play audio from OpenAI: transport . onAudioData = ( data ) => {
// Decode and play audio buffer
playAudioBuffer ( data );
};
Error handling
The runtime provides comprehensive error handling at each stage:
Connection errors
// From packages/voice-frontend/src/useWebVoiceAgent.ts:137-148
catch ( startError ) {
const message = formatError ( startError );
setError ( message );
setStatus ( "error" );
try {
sessionRef . current ?. close ();
} catch {
// ignore close errors during bootstrap
}
sessionRef . current = null ;
}
Common connection errors include:
Invalid client secret : Check your backend API key configuration
Network timeout : Verify network connectivity and firewall settings
WebRTC not supported : Ensure the browser/platform supports WebRTC
Microphone permission denied : Request microphone access before starting
Backend errors
The backend client handles HTTP errors gracefully:
// From packages/voice-frontend/src/backend.ts:94-96
if ( ! response . ok ) {
throw new Error ( await readTextSafe ( response ));
}
Function execution errors
Function execution errors are caught and returned to the AI agent:
// From packages/voice-frontend/src/agent.ts:116-123
try {
const result = await frontendDefinition . run ( payload ?? {}, options );
return { ok: true , function_name: frontendDefinition . name , source: frontendDefinition . source , result };
} catch ( error ) {
return {
ok: false ,
function_name: frontendDefinition . name ,
error: "Function execution failed." ,
details: toErrorMessage ( error )
};
}
Runtime warnings
The runtime collects warnings during initialization to help you debug configuration issues:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:44-50
function emitWarnings ( warnings : string []) : void {
for ( const warning of warnings ) {
if ( warning . trim (). length > 0 ) {
console . warn ( warning );
}
}
}
Common warnings include:
Function name conflicts between frontend and backend
Invalid function names that can’t be used as tool names
Failed to load function modules
Routes file not found or invalid
Check your browser console for runtime warnings during development. They provide valuable debugging information without blocking execution.
Connection latency
WebRTC provides low-latency audio streaming, typically 100-300ms round-trip. To optimize:
Use a backend server geographically close to your users
Request client secrets with appropriate TTL (longer for stable connections, shorter for security)
Reuse sessions instead of creating new connections frequently
Client secret TTL
Balance security and user experience when setting TTL:
// From packages/voice-backend/src/index.ts:22-23
const MIN_TTL_SECONDS = 10 ;
const MAX_TTL_SECONDS = 7200 ;
Short TTL (60-300s)
Medium TTL (600-1800s)
Long TTL (3600-7200s)
Best for : High-security applications, temporary voice featuresTrade-offs : May need to reconnect during long conversations
Best for : Most applications (default is 600s)Trade-offs : Good balance of security and user experience
Best for : Extended voice sessions, reduced backend callsTrade-offs : Longer exposure window if credentials are compromised
Memory management
Properly clean up resources to prevent memory leaks:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:98-102
useEffect (() => {
return () => {
stop ();
};
}, [ stop ]);
Next steps
UI navigation Learn how voice commands trigger UI navigation
Function execution Understand how to define and execute functions
Backend setup Configure your backend server
Frontend integration Integrate the voice runtime in your React app