Skip to main content
NAVAI is built as a modular SDK with three core packages that work together to enable voice-first AI navigation and function execution in your applications.

Package overview

The NAVAI architecture consists of three main packages:

voice-backend

Node.js backend for OpenAI Realtime API integration and server-side function execution

voice-frontend

Web runtime for React applications with voice navigation and function calling

voice-mobile

React Native runtime with pluggable WebRTC transport for mobile apps

How the packages interact

1

Client requests connection

Your frontend or mobile app requests a client secret from your backend server to establish a secure connection with OpenAI’s Realtime API.
2

Backend issues ephemeral credentials

The backend server calls OpenAI to generate a short-lived client secret and returns it to your client app.
3

Client connects to OpenAI

Your app uses the client secret to establish a WebRTC connection directly with OpenAI’s Realtime API.
4

Voice interaction begins

Audio streams bidirectionally between your app and OpenAI, with the AI agent able to call navigation and function tools.

Backend package

The @navai/voice-backend package provides Express.js middleware and utilities for integrating OpenAI’s Realtime API into your Node.js server.

Key responsibilities

  • Client secret generation: Creates ephemeral credentials for secure client connections
  • Backend function registration: Exposes server-side functions that the AI can execute
  • Function execution: Handles execution of backend functions requested by the AI agent
  • Configuration management: Manages API keys, model settings, and session parameters

Core routes

When you call registerNavaiExpressRoutes(), three HTTP endpoints are automatically registered:
// From packages/voice-backend/src/index.ts:24-26
const DEFAULT_CLIENT_SECRET_PATH = "/navai/realtime/client-secret";
const DEFAULT_FUNCTIONS_LIST_PATH = "/navai/functions";
const DEFAULT_FUNCTIONS_EXECUTE_PATH = "/navai/functions/execute";
POST /navai/realtime/client-secretGenerates an ephemeral client secret for connecting to OpenAI’s Realtime API. The secret expires after a configurable TTL (default 600 seconds).GET /navai/functionsReturns a list of available backend functions that the AI agent can execute.POST /navai/functions/executeExecutes a backend function by name with the provided payload.

Client secret flow

The backend creates client secrets by calling OpenAI’s client secrets endpoint:
// From packages/voice-backend/src/index.ts:160-205
export async function createRealtimeClientSecret(
  opts: NavaiVoiceBackendOptions,
  req?: CreateClientSecretRequest
): Promise<OpenAIRealtimeClientSecretResponse> {
  validateOptions(opts);
  const apiKey = resolveApiKey(opts, req);

  const model = req?.model ?? opts.defaultModel ?? "gpt-realtime";
  const voice = req?.voice ?? opts.defaultVoice ?? "marin";
  const baseInstructions = req?.instructions ?? opts.defaultInstructions ?? "You are a helpful assistant.";
  const instructions = buildSessionInstructions({
    baseInstructions,
    language: req?.language ?? opts.defaultLanguage,
    voiceAccent: req?.voiceAccent ?? opts.defaultVoiceAccent,
    voiceTone: req?.voiceTone ?? opts.defaultVoiceTone
  });
  const ttl = opts.clientSecretTtlSeconds ?? 600;

  const body = {
    expires_after: { anchor: "created_at", seconds: ttl },
    session: {
      type: "realtime",
      model,
      instructions,
      audio: {
        output: { voice }
      }
    }
  };

  const response = await fetch(OPENAI_CLIENT_SECRETS_URL, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiKey}`,
      "Content-Type": "application/json"
    },
    body: JSON.stringify(body)
  });

  if (!response.ok) {
    const message = await response.text();
    throw new Error(`OpenAI client_secrets failed (${response.status}): ${message}`);
  }

  return (await response.json()) as OpenAIRealtimeClientSecretResponse;
}
The client secret flow ensures your OpenAI API key never leaves your backend server, maintaining security while allowing direct client-to-OpenAI communication.

Frontend package

The @navai/voice-frontend package provides React hooks and runtime utilities for building voice-enabled web applications.

Key components

useWebVoiceAgent hook The main React hook that manages the voice session lifecycle:
// From packages/voice-frontend/src/useWebVoiceAgent.ts:14-34
export type UseWebVoiceAgentOptions = {
  navigate: (path: string) => void;
  moduleLoaders: NavaiFunctionModuleLoaders;
  defaultRoutes: NavaiRoute[];
  env?: NavaiFrontendEnv;
  apiBaseUrl?: string;
  routesFile?: string;
  functionsFolders?: string;
  modelOverride?: string;
  defaultRoutesFile?: string;
  defaultFunctionsFolder?: string;
};

export type UseWebVoiceAgentResult = {
  status: VoiceStatus;
  error: string | null;
  isConnecting: boolean;
  isConnected: boolean;
  start: () => Promise<void>;
  stop: () => void;
};
Backend client Handles communication with your backend server:
// From packages/voice-frontend/src/backend.ts:38-42
export type NavaiBackendClient = {
  createClientSecret: (input?: CreateClientSecretInput) => Promise<CreateClientSecretOutput>;
  listFunctions: () => Promise<BackendFunctionsResult>;
  executeFunction: ExecuteNavaiBackendFunction;
};

Agent building process

The frontend builds an OpenAI Realtime agent with navigation and function execution capabilities:
// From packages/voice-frontend/src/agent.ts:47-250
export async function buildNavaiAgent(options: BuildNavaiAgentOptions): Promise<BuildNavaiAgentResult> {
  const functionsRegistry = await loadNavaiFunctions(options.functionModuleLoaders ?? {});
  
  // Load backend functions
  const backendFunctionsByName = new Map<string, NavaiBackendFunctionDefinition>();
  // ... backend function registration

  // Create navigation tool
  const navigateTool = tool({
    name: "navigate_to",
    description: "Navigate to an allowed route in the current app.",
    parameters: z.object({
      target: z
        .string()
        .min(1)
        .describe("Route name or route path. Example: perfil, ajustes, /profile, /settings")
    }),
    execute: async ({ target }) => {
      const path = resolveNavaiRoute(target, options.routes);
      if (!path) {
        return { ok: false, error: "Unknown or disallowed route." };
      }

      options.navigate(path);
      return { ok: true, path };
    }
  });

  // Create function execution tool
  const executeFunctionTool = tool({
    name: "execute_app_function",
    description: "Execute an allowed internal app function by name.",
    // ... implementation
  });

  // Build agent with instructions
  const agent = new RealtimeAgent({
    name: options.agentName ?? "Navai Voice Agent",
    instructions,
    tools: [navigateTool, executeFunctionTool, ...directFunctionTools]
  });

  return { agent, warnings };
}

Mobile package

The @navai/voice-mobile package provides React Native support with pluggable transport layers.

Key features

  • React Native WebRTC integration: Native WebRTC support for mobile devices
  • Pluggable transport: Abstraction layer allowing different transport implementations
  • Platform-specific audio: Handles iOS and Android audio streaming differences
  • Session management: Mobile-optimized session lifecycle management

Mobile architecture differences

Unlike the web package which uses the OpenAI SDK’s built-in WebRTC support, the mobile package provides:
A NavaiRealtimeTransport interface that can be implemented with different WebRTC libraries:
// From packages/voice-mobile/src/transport.ts
export type NavaiRealtimeTransport = {
  connect: (options: NavaiRealtimeTransportConnectOptions) => Promise<void>;
  disconnect: () => void;
  state: NavaiRealtimeTransportState;
  // ... event handlers
};

Function execution architecture

Functions can be executed in two locations:

Frontend functions

Executed directly in the client application with access to the navigation context:
// From packages/voice-frontend/src/functions.ts:3-5
export type NavaiFunctionContext = {
  navigate: (path: string) => void;
};

Backend functions

Executed on the server with access to backend resources:
// From packages/voice-backend/src/functions.ts:9
export type NavaiFunctionContext = Record<string, unknown>;
Frontend functions have priority. If a function with the same name exists in both frontend and backend, only the frontend version will be called.

Function execution flow

1

AI agent calls tool

OpenAI’s Realtime API calls execute_app_function or a direct function tool.
2

Frontend checks local registry

The frontend first checks if the function is registered locally.
3

Execute frontend or backend

If found locally, execute immediately. Otherwise, make an HTTP call to the backend’s /navai/functions/execute endpoint.
4

Return result to AI

The function result is sent back to OpenAI, which may speak the result or use it to inform the next action.

Configuration and environment

All packages support configuration via environment variables and explicit options:
OPENAI_API_KEY=sk-...
OPENAI_REALTIME_MODEL=gpt-4o-realtime-preview
OPENAI_REALTIME_VOICE=alloy
OPENAI_REALTIME_INSTRUCTIONS="You are a helpful assistant."
OPENAI_REALTIME_CLIENT_SECRET_TTL=600
NAVAI_FUNCTIONS_FOLDERS="src/ai/functions-modules"
NAVAI_ALLOW_FRONTEND_API_KEY=false

Security considerations

Never expose your OpenAI API key to client applications. Always use the backend client secret flow to generate ephemeral credentials.
The architecture is designed with security in mind:
  • Server-side API key storage: Your OpenAI API key lives only on your backend
  • Ephemeral client secrets: Short-lived credentials that expire automatically
  • Function allowlisting: Only explicitly registered functions can be executed
  • Route validation: Navigation is restricted to predefined routes
  • Backend-only sensitive operations: Database queries and external API calls stay on the server

Next steps

Voice runtime

Learn how the voice runtime manages OpenAI Realtime API connections

UI navigation

Understand how voice-controlled navigation works

Function execution

Explore how functions are defined and executed

Getting started

Build your first voice-enabled app

Build docs developers (and LLMs) love