Skip to main content
The voice runtime is the core engine that powers NAVAI’s voice-first capabilities. It manages the connection to OpenAI’s Realtime API, handles bidirectional audio streaming, and coordinates tool execution.

How the runtime works

The voice runtime establishes a WebRTC connection between your application and OpenAI’s Realtime API, enabling low-latency voice interaction.
1

Initialize runtime configuration

Load routes, functions, and environment settings to configure the voice agent.
2

Request client secret

Your app calls your backend to generate an ephemeral OpenAI client secret.
3

Build agent with tools

Create a RealtimeAgent with navigation and function execution tools.
4

Establish WebRTC connection

Connect to OpenAI using the client secret and start audio streaming.
5

Handle voice interaction

Process audio input, execute tools, and stream audio responses back to the user.

Web runtime

The web runtime uses the @openai/agents SDK with React hooks for state management.

Starting a voice session

The useWebVoiceAgent hook manages the complete session lifecycle:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:104-149
const start = useCallback(async (): Promise<void> => {
  if (status === "connecting" || status === "connected") {
    return;
  }

  setError(null);
  setStatus("connecting");

  try {
    // 1. Load runtime configuration (routes + functions)
    const runtimeConfig = await runtimeConfigPromise;
    
    // 2. Request client secret from backend
    const requestPayload = runtimeConfig.modelOverride ? { model: runtimeConfig.modelOverride } : {};
    const secretPayload = await backendClient.createClientSecret(requestPayload);
    
    // 3. Load backend functions list
    const backendFunctionsResult = await backendClient.listFunctions();

    // 4. Build agent with tools
    const { agent, warnings } = await buildNavaiAgent({
      navigate: options.navigate,
      routes: runtimeConfig.routes,
      functionModuleLoaders: runtimeConfig.functionModuleLoaders,
      backendFunctions: backendFunctionsResult.functions,
      executeBackendFunction: backendClient.executeFunction
    });
    emitWarnings([...runtimeConfig.warnings, ...backendFunctionsResult.warnings, ...warnings]);

    // 5. Create and connect session
    const session = new RealtimeSession(agent);

    if (runtimeConfig.modelOverride) {
      await session.connect({ apiKey: secretPayload.value, model: runtimeConfig.modelOverride });
    } else {
      await session.connect({ apiKey: secretPayload.value });
    }

    sessionRef.current = session;
    setStatus("connected");
  } catch (startError) {
    const message = formatError(startError);
    setError(message);
    setStatus("error");

    try {
      sessionRef.current?.close();
    } catch {
      // ignore close errors during bootstrap
    }
    sessionRef.current = null;
  }
}, [backendClient, options.navigate, runtimeConfigPromise, status]);

Session states

The runtime tracks connection state through a simple state machine:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:10
type VoiceStatus = "idle" | "connecting" | "connected" | "error";
Initial state before any connection attempt. The agent is not running and no resources are allocated.

Stopping the session

Cleanly close the connection and release resources:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:89-96
const stop = useCallback(() => {
  try {
    sessionRef.current?.close();
  } finally {
    sessionRef.current = null;
    setStatus("idle");
  }
}, []);
Always call stop() when unmounting your component or when the user navigates away to properly close the WebRTC connection and prevent memory leaks.

Mobile runtime

The mobile runtime provides additional flexibility through pluggable transport layers to support different WebRTC implementations.

Transport abstraction

Mobile apps use a transport interface that abstracts WebRTC implementation details:
// From packages/voice-mobile/src/transport.ts
export type NavaiRealtimeTransport = {
  connect: (options: NavaiRealtimeTransportConnectOptions) => Promise<void>;
  disconnect: () => void;
  state: NavaiRealtimeTransportState;
  onAudioData?: (data: ArrayBuffer) => void;
  onMessage?: (message: unknown) => void;
  onError?: (error: Error) => void;
  sendAudio: (data: ArrayBuffer) => void;
  sendMessage: (message: unknown) => void;
};

React Native WebRTC transport

NAVAI provides a pre-built transport for react-native-webrtc:
// From packages/voice-mobile/src/react-native-webrtc.ts
export function createReactNativeWebRtcTransport(
  options: CreateReactNativeWebRtcTransportOptions
): NavaiRealtimeTransport {
  // Create peer connection
  const peerConnection = new RTCPeerConnection({
    iceServers: options.iceServers,
  });
  
  // Setup audio tracks
  // Setup data channels
  // Handle connection lifecycle
  
  return transport;
}
You can implement your own transport to use different WebRTC libraries or custom audio streaming solutions.

Mobile session lifecycle

The mobile runtime uses createNavaiMobileVoiceSession to manage sessions:
// From packages/voice-mobile/src/session.ts
export function createNavaiMobileVoiceSession(
  options: CreateNavaiMobileVoiceSessionOptions
): NavaiMobileVoiceSession {
  const session = {
    async start(input: StartNavaiMobileVoiceSessionInput) {
      // 1. Request client secret
      // 2. Create agent runtime
      // 3. Connect transport
      // 4. Start audio streaming
    },
    
    stop() {
      // Disconnect transport and cleanup
    },
    
    getSnapshot(): NavaiMobileVoiceSessionSnapshot {
      return {
        state: currentState,
        error: currentError
      };
    }
  };
  
  return session;
}

OpenAI Realtime API connection

The runtime connects to OpenAI’s Realtime API using ephemeral client secrets generated by your backend.

Client secret lifecycle

1

Backend requests client secret

Your server calls OpenAI’s /v1/realtime/client_secrets endpoint with your API key:
// From packages/voice-backend/src/index.ts:21
const OPENAI_CLIENT_SECRETS_URL = "https://api.openai.com/v1/realtime/client_secrets";
2

OpenAI returns ephemeral credential

OpenAI responds with a time-limited client secret:
// From packages/voice-backend/src/index.ts:52-56
export type OpenAIRealtimeClientSecretResponse = {
  value: string;
  expires_at: number;
  session?: unknown;
};
3

Client uses secret to connect

Your frontend or mobile app uses this secret as a temporary API key to establish a WebRTC connection.
4

Secret expires automatically

After the TTL expires (default 600 seconds), the secret becomes invalid and cannot be reused.

Configuring session parameters

You can customize the AI model, voice, and instructions when requesting a client secret:
// From packages/voice-backend/src/index.ts:42-50
export type CreateClientSecretRequest = {
  model?: string;
  voice?: string;
  instructions?: string;
  language?: string;
  voiceAccent?: string;
  voiceTone?: string;
  apiKey?: string;
};
OpenAI provides several voice options:
  • alloy: Neutral and balanced
  • echo: Warm and upbeat
  • fable: British accent, expressive
  • onyx: Deep and authoritative
  • nova: Energetic and friendly
  • shimmer: Soft and gentle
  • marin: Natural and conversational (default in NAVAI)
Set the voice in your backend options:
defaultVoice: "alloy"

Session instructions

The backend builds session instructions by combining base instructions with language, accent, and tone preferences:
// From packages/voice-backend/src/index.ts:134-158
function buildSessionInstructions(input: {
  baseInstructions: string;
  language?: string;
  voiceAccent?: string;
  voiceTone?: string;
}): string {
  const lines = [input.baseInstructions.trim()];
  const language = readOptional(input.language);
  const voiceAccent = readOptional(input.voiceAccent);
  const voiceTone = readOptional(input.voiceTone);

  if (language) {
    lines.push(`Always reply in ${language}.`);
  }

  if (voiceAccent) {
    lines.push(`Use a ${voiceAccent} accent while speaking.`);
  }

  if (voiceTone) {
    lines.push(`Use a ${voiceTone} tone while speaking.`);
  }

  return lines.join("\n");
}

Audio streaming and processing

The runtime handles bidirectional audio streaming between your app and OpenAI.

Web audio streaming

In web apps, the @openai/agents SDK handles audio capture and playback using the Web Audio API:
// Browser automatically handles:
// - Microphone access via getUserMedia
// - Audio encoding for WebRTC
// - Audio decoding and playback
// - Echo cancellation and noise suppression

Mobile audio streaming

Mobile apps require platform-specific audio handling:
Capture microphone input using React Native modules:
// Using react-native-webrtc
const localStream = await mediaDevices.getUserMedia({
  audio: true,
  video: false
});

Error handling

The runtime provides comprehensive error handling at each stage:

Connection errors

// From packages/voice-frontend/src/useWebVoiceAgent.ts:137-148
catch (startError) {
  const message = formatError(startError);
  setError(message);
  setStatus("error");

  try {
    sessionRef.current?.close();
  } catch {
    // ignore close errors during bootstrap
  }
  sessionRef.current = null;
}
Common connection errors include:
  • Invalid client secret: Check your backend API key configuration
  • Network timeout: Verify network connectivity and firewall settings
  • WebRTC not supported: Ensure the browser/platform supports WebRTC
  • Microphone permission denied: Request microphone access before starting

Backend errors

The backend client handles HTTP errors gracefully:
// From packages/voice-frontend/src/backend.ts:94-96
if (!response.ok) {
  throw new Error(await readTextSafe(response));
}

Function execution errors

Function execution errors are caught and returned to the AI agent:
// From packages/voice-frontend/src/agent.ts:116-123
try {
  const result = await frontendDefinition.run(payload ?? {}, options);
  return { ok: true, function_name: frontendDefinition.name, source: frontendDefinition.source, result };
} catch (error) {
  return {
    ok: false,
    function_name: frontendDefinition.name,
    error: "Function execution failed.",
    details: toErrorMessage(error)
  };
}

Runtime warnings

The runtime collects warnings during initialization to help you debug configuration issues:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:44-50
function emitWarnings(warnings: string[]): void {
  for (const warning of warnings) {
    if (warning.trim().length > 0) {
      console.warn(warning);
    }
  }
}
Common warnings include:
  • Function name conflicts between frontend and backend
  • Invalid function names that can’t be used as tool names
  • Failed to load function modules
  • Routes file not found or invalid
Check your browser console for runtime warnings during development. They provide valuable debugging information without blocking execution.

Performance considerations

Connection latency

WebRTC provides low-latency audio streaming, typically 100-300ms round-trip. To optimize:
  • Use a backend server geographically close to your users
  • Request client secrets with appropriate TTL (longer for stable connections, shorter for security)
  • Reuse sessions instead of creating new connections frequently

Client secret TTL

Balance security and user experience when setting TTL:
// From packages/voice-backend/src/index.ts:22-23
const MIN_TTL_SECONDS = 10;
const MAX_TTL_SECONDS = 7200;
Best for: High-security applications, temporary voice featuresTrade-offs: May need to reconnect during long conversations

Memory management

Properly clean up resources to prevent memory leaks:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:98-102
useEffect(() => {
  return () => {
    stop();
  };
}, [stop]);

Next steps

UI navigation

Learn how voice commands trigger UI navigation

Function execution

Understand how to define and execute functions

Backend setup

Configure your backend server

Frontend integration

Integrate the voice runtime in your React app

Build docs developers (and LLMs) love