Voice runtime

The voice runtime is the core engine that powers NAVAI’s voice-first capabilities. It manages the connection to OpenAI’s Realtime API, handles bidirectional audio streaming, and coordinates tool execution.

How the runtime works

The voice runtime establishes a WebRTC connection between your application and OpenAI’s Realtime API, enabling low-latency voice interaction.

Initialize runtime configuration

Load routes, functions, and environment settings to configure the voice agent.

Request client secret

Your app calls your backend to generate an ephemeral OpenAI client secret.

Build agent with tools

Create a RealtimeAgent with navigation and function execution tools.

Establish WebRTC connection

Connect to OpenAI using the client secret and start audio streaming.

Handle voice interaction

Process audio input, execute tools, and stream audio responses back to the user.

Web runtime

The web runtime uses the @openai/agents SDK with React hooks for state management.

Starting a voice session

The useWebVoiceAgent hook manages the complete session lifecycle:

// From packages/voice-frontend/src/useWebVoiceAgent.ts:104-149
const start = useCallback(async (): Promise<void> => {
  if (status === "connecting" || status === "connected") {
    return;
  }

  setError(null);
  setStatus("connecting");

  try {
    // 1. Load runtime configuration (routes + functions)
    const runtimeConfig = await runtimeConfigPromise;
    
    // 2. Request client secret from backend
    const requestPayload = runtimeConfig.modelOverride ? { model: runtimeConfig.modelOverride } : {};
    const secretPayload = await backendClient.createClientSecret(requestPayload);
    
    // 3. Load backend functions list
    const backendFunctionsResult = await backendClient.listFunctions();

    // 4. Build agent with tools
    const { agent, warnings } = await buildNavaiAgent({
      navigate: options.navigate,
      routes: runtimeConfig.routes,
      functionModuleLoaders: runtimeConfig.functionModuleLoaders,
      backendFunctions: backendFunctionsResult.functions,
      executeBackendFunction: backendClient.executeFunction
    });
    emitWarnings([...runtimeConfig.warnings, ...backendFunctionsResult.warnings, ...warnings]);

    // 5. Create and connect session
    const session = new RealtimeSession(agent);

    if (runtimeConfig.modelOverride) {
      await session.connect({ apiKey: secretPayload.value, model: runtimeConfig.modelOverride });
    } else {
      await session.connect({ apiKey: secretPayload.value });
    }

    sessionRef.current = session;
    setStatus("connected");
  } catch (startError) {
    const message = formatError(startError);
    setError(message);
    setStatus("error");

    try {
      sessionRef.current?.close();
    } catch {
      // ignore close errors during bootstrap
    }
    sessionRef.current = null;
  }
}, [backendClient, options.navigate, runtimeConfigPromise, status]);

Session states

The runtime tracks connection state through a simple state machine:

// From packages/voice-frontend/src/useWebVoiceAgent.ts:10
type VoiceStatus = "idle" | "connecting" | "connected" | "error";

idle
connecting
connected
error

Initial state before any connection attempt. The agent is not running and no resources are allocated.

Connection failed or was interrupted. Check the error property for details.

Stopping the session

Cleanly close the connection and release resources:

// From packages/voice-frontend/src/useWebVoiceAgent.ts:89-96
const stop = useCallback(() => {
  try {
    sessionRef.current?.close();
  } finally {
    sessionRef.current = null;
    setStatus("idle");
  }
}, []);

Always call stop() when unmounting your component or when the user navigates away to properly close the WebRTC connection and prevent memory leaks.

Mobile runtime

The mobile runtime provides additional flexibility through pluggable transport layers to support different WebRTC implementations.

Transport abstraction

Mobile apps use a transport interface that abstracts WebRTC implementation details:

// From packages/voice-mobile/src/transport.ts
export type NavaiRealtimeTransport = {
  connect: (options: NavaiRealtimeTransportConnectOptions) => Promise<void>;
  disconnect: () => void;
  state: NavaiRealtimeTransportState;
  onAudioData?: (data: ArrayBuffer) => void;
  onMessage?: (message: unknown) => void;
  onError?: (error: Error) => void;
  sendAudio: (data: ArrayBuffer) => void;
  sendMessage: (message: unknown) => void;
};

React Native WebRTC transport

NAVAI provides a pre-built transport for react-native-webrtc:

// From packages/voice-mobile/src/react-native-webrtc.ts
export function createReactNativeWebRtcTransport(
  options: CreateReactNativeWebRtcTransportOptions
): NavaiRealtimeTransport {
  // Create peer connection
  const peerConnection = new RTCPeerConnection({
    iceServers: options.iceServers,
  });
  
  // Setup audio tracks
  // Setup data channels
  // Handle connection lifecycle
  
  return transport;
}

You can implement your own transport to use different WebRTC libraries or custom audio streaming solutions.

Mobile session lifecycle

The mobile runtime uses createNavaiMobileVoiceSession to manage sessions:

// From packages/voice-mobile/src/session.ts
export function createNavaiMobileVoiceSession(
  options: CreateNavaiMobileVoiceSessionOptions
): NavaiMobileVoiceSession {
  const session = {
    async start(input: StartNavaiMobileVoiceSessionInput) {
      // 1. Request client secret
      // 2. Create agent runtime
      // 3. Connect transport
      // 4. Start audio streaming
    },
    
    stop() {
      // Disconnect transport and cleanup
    },
    
    getSnapshot(): NavaiMobileVoiceSessionSnapshot {
      return {
        state: currentState,
        error: currentError
      };
    }
  };
  
  return session;
}

OpenAI Realtime API connection

The runtime connects to OpenAI’s Realtime API using ephemeral client secrets generated by your backend.

Client secret lifecycle

Backend requests client secret

Your server calls OpenAI’s /v1/realtime/client_secrets endpoint with your API key:

// From packages/voice-backend/src/index.ts:21
const OPENAI_CLIENT_SECRETS_URL = "https://api.openai.com/v1/realtime/client_secrets";

OpenAI returns ephemeral credential

OpenAI responds with a time-limited client secret:

// From packages/voice-backend/src/index.ts:52-56
export type OpenAIRealtimeClientSecretResponse = {
  value: string;
  expires_at: number;
  session?: unknown;
};

Client uses secret to connect

Your frontend or mobile app uses this secret as a temporary API key to establish a WebRTC connection.

Secret expires automatically

After the TTL expires (default 600 seconds), the secret becomes invalid and cannot be reused.

Configuring session parameters

You can customize the AI model, voice, and instructions when requesting a client secret:

// From packages/voice-backend/src/index.ts:42-50
export type CreateClientSecretRequest = {
  model?: string;
  voice?: string;
  instructions?: string;
  language?: string;
  voiceAccent?: string;
  voiceTone?: string;
  apiKey?: string;
};

Available voice options

OpenAI provides several voice options:

alloy: Neutral and balanced
echo: Warm and upbeat
fable: British accent, expressive
onyx: Deep and authoritative
nova: Energetic and friendly
shimmer: Soft and gentle
marin: Natural and conversational (default in NAVAI)

Set the voice in your backend options:

defaultVoice: "alloy"

Session instructions

The backend builds session instructions by combining base instructions with language, accent, and tone preferences:

// From packages/voice-backend/src/index.ts:134-158
function buildSessionInstructions(input: {
  baseInstructions: string;
  language?: string;
  voiceAccent?: string;
  voiceTone?: string;
}): string {
  const lines = [input.baseInstructions.trim()];
  const language = readOptional(input.language);
  const voiceAccent = readOptional(input.voiceAccent);
  const voiceTone = readOptional(input.voiceTone);

  if (language) {
    lines.push(`Always reply in ${language}.`);
  }

  if (voiceAccent) {
    lines.push(`Use a ${voiceAccent} accent while speaking.`);
  }

  if (voiceTone) {
    lines.push(`Use a ${voiceTone} tone while speaking.`);
  }

  return lines.join("\n");
}

Audio streaming and processing

The runtime handles bidirectional audio streaming between your app and OpenAI.

Web audio streaming

In web apps, the @openai/agents SDK handles audio capture and playback using the Web Audio API:

// Browser automatically handles:
// - Microphone access via getUserMedia
// - Audio encoding for WebRTC
// - Audio decoding and playback
// - Echo cancellation and noise suppression

Mobile audio streaming

Mobile apps require platform-specific audio handling:

Audio capture
Audio transmission
Audio playback

Capture microphone input using React Native modules:

// Using react-native-webrtc
const localStream = await mediaDevices.getUserMedia({
  audio: true,
  video: false
});

Send audio data through the WebRTC data channel:

transport.sendAudio(audioBuffer);

Receive and play audio from OpenAI:

transport.onAudioData = (data) => {
  // Decode and play audio buffer
  playAudioBuffer(data);
};

Error handling

The runtime provides comprehensive error handling at each stage:

Connection errors

// From packages/voice-frontend/src/useWebVoiceAgent.ts:137-148
catch (startError) {
  const message = formatError(startError);
  setError(message);
  setStatus("error");

  try {
    sessionRef.current?.close();
  } catch {
    // ignore close errors during bootstrap
  }
  sessionRef.current = null;
}

Common connection errors include:

Invalid client secret: Check your backend API key configuration
Network timeout: Verify network connectivity and firewall settings
WebRTC not supported: Ensure the browser/platform supports WebRTC
Microphone permission denied: Request microphone access before starting

Backend errors

The backend client handles HTTP errors gracefully:

// From packages/voice-frontend/src/backend.ts:94-96
if (!response.ok) {
  throw new Error(await readTextSafe(response));
}

Function execution errors

Function execution errors are caught and returned to the AI agent:

// From packages/voice-frontend/src/agent.ts:116-123
try {
  const result = await frontendDefinition.run(payload ?? {}, options);
  return { ok: true, function_name: frontendDefinition.name, source: frontendDefinition.source, result };
} catch (error) {
  return {
    ok: false,
    function_name: frontendDefinition.name,
    error: "Function execution failed.",
    details: toErrorMessage(error)
  };
}

Runtime warnings

The runtime collects warnings during initialization to help you debug configuration issues:

// From packages/voice-frontend/src/useWebVoiceAgent.ts:44-50
function emitWarnings(warnings: string[]): void {
  for (const warning of warnings) {
    if (warning.trim().length > 0) {
      console.warn(warning);
    }
  }
}

Common warnings include:

Function name conflicts between frontend and backend
Invalid function names that can’t be used as tool names
Failed to load function modules
Routes file not found or invalid

Check your browser console for runtime warnings during development. They provide valuable debugging information without blocking execution.

Performance considerations

Connection latency

WebRTC provides low-latency audio streaming, typically 100-300ms round-trip. To optimize:

Use a backend server geographically close to your users
Request client secrets with appropriate TTL (longer for stable connections, shorter for security)
Reuse sessions instead of creating new connections frequently

Client secret TTL

Balance security and user experience when setting TTL:

// From packages/voice-backend/src/index.ts:22-23
const MIN_TTL_SECONDS = 10;
const MAX_TTL_SECONDS = 7200;

Short TTL (60-300s)
Medium TTL (600-1800s)
Long TTL (3600-7200s)

Best for: High-security applications, temporary voice featuresTrade-offs: May need to reconnect during long conversations

Memory management

Properly clean up resources to prevent memory leaks:

// From packages/voice-frontend/src/useWebVoiceAgent.ts:98-102
useEffect(() => {
  return () => {
    stop();
  };
}, [stop]);

Next steps

UI navigation

Learn how voice commands trigger UI navigation

Function execution

Understand how to define and execute functions

Backend setup

Configure your backend server

Frontend integration

Integrate the voice runtime in your React app

Get Started

Core Concepts

Backend Integration

Frontend Integration

Mobile Integration

Guides

Voice runtime

How the runtime works

Web runtime

Starting a voice session

Session states

Stopping the session

Mobile runtime

Transport abstraction

React Native WebRTC transport

Mobile session lifecycle

OpenAI Realtime API connection

Client secret lifecycle

Configuring session parameters

Session instructions

Audio streaming and processing

Web audio streaming

Mobile audio streaming

Error handling

Connection errors

Backend errors

Function execution errors

Runtime warnings

Performance considerations

Connection latency

Client secret TTL

Memory management

Next steps

UI navigation

Function execution

Backend setup

Frontend integration

Build docs developers (and LLMs) love

Get Started

Core Concepts

Backend Integration

Frontend Integration

Mobile Integration

Guides

​How the runtime works

​Web runtime

​Starting a voice session

​Session states

​Stopping the session

​Mobile runtime

​Transport abstraction

​React Native WebRTC transport

​Mobile session lifecycle

​OpenAI Realtime API connection

​Client secret lifecycle

​Configuring session parameters

​Session instructions

​Audio streaming and processing

​Web audio streaming

​Mobile audio streaming

​Error handling

​Connection errors

​Backend errors

​Function execution errors

​Runtime warnings

​Performance considerations

​Connection latency

​Client secret TTL

​Memory management

​Next steps

UI navigation

Function execution

Backend setup

Frontend integration

Build docs developers (and LLMs) love

How the runtime works

Web runtime

Starting a voice session

Session states

Stopping the session

Mobile runtime

Transport abstraction

React Native WebRTC transport

Mobile session lifecycle

OpenAI Realtime API connection

Client secret lifecycle

Configuring session parameters

Session instructions

Audio streaming and processing

Web audio streaming

Mobile audio streaming

Error handling

Connection errors

Backend errors

Function execution errors

Runtime warnings

Performance considerations

Connection latency

Client secret TTL

Memory management

Next steps