NAVAI is built as a modular SDK with three core packages that work together to enable voice-first AI navigation and function execution in your applications.
Package overview
The NAVAI architecture consists of three main packages:
voice-backend Node.js backend for OpenAI Realtime API integration and server-side function execution
voice-frontend Web runtime for React applications with voice navigation and function calling
voice-mobile React Native runtime with pluggable WebRTC transport for mobile apps
How the packages interact
Client requests connection
Your frontend or mobile app requests a client secret from your backend server to establish a secure connection with OpenAI’s Realtime API.
Backend issues ephemeral credentials
The backend server calls OpenAI to generate a short-lived client secret and returns it to your client app.
Client connects to OpenAI
Your app uses the client secret to establish a WebRTC connection directly with OpenAI’s Realtime API.
Voice interaction begins
Audio streams bidirectionally between your app and OpenAI, with the AI agent able to call navigation and function tools.
Backend package
The @navai/voice-backend package provides Express.js middleware and utilities for integrating OpenAI’s Realtime API into your Node.js server.
Key responsibilities
Client secret generation : Creates ephemeral credentials for secure client connections
Backend function registration : Exposes server-side functions that the AI can execute
Function execution : Handles execution of backend functions requested by the AI agent
Configuration management : Manages API keys, model settings, and session parameters
Core routes
When you call registerNavaiExpressRoutes(), three HTTP endpoints are automatically registered:
// From packages/voice-backend/src/index.ts:24-26
const DEFAULT_CLIENT_SECRET_PATH = "/navai/realtime/client-secret" ;
const DEFAULT_FUNCTIONS_LIST_PATH = "/navai/functions" ;
const DEFAULT_FUNCTIONS_EXECUTE_PATH = "/navai/functions/execute" ;
POST /navai/realtime/client-secret Generates an ephemeral client secret for connecting to OpenAI’s Realtime API. The secret expires after a configurable TTL (default 600 seconds). GET /navai/functions Returns a list of available backend functions that the AI agent can execute. POST /navai/functions/execute Executes a backend function by name with the provided payload.
Client secret flow
The backend creates client secrets by calling OpenAI’s client secrets endpoint:
// From packages/voice-backend/src/index.ts:160-205
export async function createRealtimeClientSecret (
opts : NavaiVoiceBackendOptions ,
req ?: CreateClientSecretRequest
) : Promise < OpenAIRealtimeClientSecretResponse > {
validateOptions ( opts );
const apiKey = resolveApiKey ( opts , req );
const model = req ?. model ?? opts . defaultModel ?? "gpt-realtime" ;
const voice = req ?. voice ?? opts . defaultVoice ?? "marin" ;
const baseInstructions = req ?. instructions ?? opts . defaultInstructions ?? "You are a helpful assistant." ;
const instructions = buildSessionInstructions ({
baseInstructions ,
language: req ?. language ?? opts . defaultLanguage ,
voiceAccent: req ?. voiceAccent ?? opts . defaultVoiceAccent ,
voiceTone: req ?. voiceTone ?? opts . defaultVoiceTone
});
const ttl = opts . clientSecretTtlSeconds ?? 600 ;
const body = {
expires_after: { anchor: "created_at" , seconds: ttl },
session: {
type: "realtime" ,
model ,
instructions ,
audio: {
output: { voice }
}
}
};
const response = await fetch ( OPENAI_CLIENT_SECRETS_URL , {
method: "POST" ,
headers: {
Authorization: `Bearer ${ apiKey } ` ,
"Content-Type" : "application/json"
},
body: JSON . stringify ( body )
});
if ( ! response . ok ) {
const message = await response . text ();
throw new Error ( `OpenAI client_secrets failed ( ${ response . status } ): ${ message } ` );
}
return ( await response . json ()) as OpenAIRealtimeClientSecretResponse ;
}
The client secret flow ensures your OpenAI API key never leaves your backend server, maintaining security while allowing direct client-to-OpenAI communication.
Frontend package
The @navai/voice-frontend package provides React hooks and runtime utilities for building voice-enabled web applications.
Key components
useWebVoiceAgent hook
The main React hook that manages the voice session lifecycle:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:14-34
export type UseWebVoiceAgentOptions = {
navigate : ( path : string ) => void ;
moduleLoaders : NavaiFunctionModuleLoaders ;
defaultRoutes : NavaiRoute [];
env ?: NavaiFrontendEnv ;
apiBaseUrl ?: string ;
routesFile ?: string ;
functionsFolders ?: string ;
modelOverride ?: string ;
defaultRoutesFile ?: string ;
defaultFunctionsFolder ?: string ;
};
export type UseWebVoiceAgentResult = {
status : VoiceStatus ;
error : string | null ;
isConnecting : boolean ;
isConnected : boolean ;
start : () => Promise < void >;
stop : () => void ;
};
Backend client
Handles communication with your backend server:
// From packages/voice-frontend/src/backend.ts:38-42
export type NavaiBackendClient = {
createClientSecret : ( input ?: CreateClientSecretInput ) => Promise < CreateClientSecretOutput >;
listFunctions : () => Promise < BackendFunctionsResult >;
executeFunction : ExecuteNavaiBackendFunction ;
};
Agent building process
The frontend builds an OpenAI Realtime agent with navigation and function execution capabilities:
// From packages/voice-frontend/src/agent.ts:47-250
export async function buildNavaiAgent ( options : BuildNavaiAgentOptions ) : Promise < BuildNavaiAgentResult > {
const functionsRegistry = await loadNavaiFunctions ( options . functionModuleLoaders ?? {});
// Load backend functions
const backendFunctionsByName = new Map < string , NavaiBackendFunctionDefinition >();
// ... backend function registration
// Create navigation tool
const navigateTool = tool ({
name: "navigate_to" ,
description: "Navigate to an allowed route in the current app." ,
parameters: z . object ({
target: z
. string ()
. min ( 1 )
. describe ( "Route name or route path. Example: perfil, ajustes, /profile, /settings" )
}),
execute : async ({ target }) => {
const path = resolveNavaiRoute ( target , options . routes );
if ( ! path ) {
return { ok: false , error: "Unknown or disallowed route." };
}
options . navigate ( path );
return { ok: true , path };
}
});
// Create function execution tool
const executeFunctionTool = tool ({
name: "execute_app_function" ,
description: "Execute an allowed internal app function by name." ,
// ... implementation
});
// Build agent with instructions
const agent = new RealtimeAgent ({
name: options . agentName ?? "Navai Voice Agent" ,
instructions ,
tools: [ navigateTool , executeFunctionTool , ... directFunctionTools ]
});
return { agent , warnings };
}
Mobile package
The @navai/voice-mobile package provides React Native support with pluggable transport layers.
Key features
React Native WebRTC integration : Native WebRTC support for mobile devices
Pluggable transport : Abstraction layer allowing different transport implementations
Platform-specific audio : Handles iOS and Android audio streaming differences
Session management : Mobile-optimized session lifecycle management
Mobile architecture differences
Unlike the web package which uses the OpenAI SDK’s built-in WebRTC support, the mobile package provides:
A NavaiRealtimeTransport interface that can be implemented with different WebRTC libraries: // From packages/voice-mobile/src/transport.ts
export type NavaiRealtimeTransport = {
connect : ( options : NavaiRealtimeTransportConnectOptions ) => Promise < void >;
disconnect : () => void ;
state : NavaiRealtimeTransportState ;
// ... event handlers
};
Pre-built adapter for the popular react-native-webrtc library: // From packages/voice-mobile/src/react-native-webrtc.ts
export function createReactNativeWebRtcTransport (
options : CreateReactNativeWebRtcTransportOptions
) : NavaiRealtimeTransport {
// Implementation using react-native-webrtc
}
React hooks optimized for mobile use cases: export function useMobileVoiceAgent (
options : UseMobileVoiceAgentOptions
) : UseMobileVoiceAgentResult {
// Mobile-optimized session management
}
Function execution architecture
Functions can be executed in two locations:
Frontend functions
Executed directly in the client application with access to the navigation context:
// From packages/voice-frontend/src/functions.ts:3-5
export type NavaiFunctionContext = {
navigate : ( path : string ) => void ;
};
Backend functions
Executed on the server with access to backend resources:
// From packages/voice-backend/src/functions.ts:9
export type NavaiFunctionContext = Record < string , unknown >;
Frontend functions have priority. If a function with the same name exists in both frontend and backend, only the frontend version will be called.
Function execution flow
AI agent calls tool
OpenAI’s Realtime API calls execute_app_function or a direct function tool.
Frontend checks local registry
The frontend first checks if the function is registered locally.
Execute frontend or backend
If found locally, execute immediately. Otherwise, make an HTTP call to the backend’s /navai/functions/execute endpoint.
Return result to AI
The function result is sent back to OpenAI, which may speak the result or use it to inform the next action.
Configuration and environment
All packages support configuration via environment variables and explicit options:
Backend environment
Frontend environment
Mobile environment
OPENAI_API_KEY = sk-...
OPENAI_REALTIME_MODEL = gpt-4o-realtime-preview
OPENAI_REALTIME_VOICE = alloy
OPENAI_REALTIME_INSTRUCTIONS = "You are a helpful assistant."
OPENAI_REALTIME_CLIENT_SECRET_TTL = 600
NAVAI_FUNCTIONS_FOLDERS = "src/ai/functions-modules"
NAVAI_ALLOW_FRONTEND_API_KEY = false
Security considerations
Never expose your OpenAI API key to client applications. Always use the backend client secret flow to generate ephemeral credentials.
The architecture is designed with security in mind:
Server-side API key storage : Your OpenAI API key lives only on your backend
Ephemeral client secrets : Short-lived credentials that expire automatically
Function allowlisting : Only explicitly registered functions can be executed
Route validation : Navigation is restricted to predefined routes
Backend-only sensitive operations : Database queries and external API calls stay on the server
Next steps
Voice runtime Learn how the voice runtime manages OpenAI Realtime API connections
UI navigation Understand how voice-controlled navigation works
Function execution Explore how functions are defined and executed
Getting started Build your first voice-enabled app