Overview
NAVAI implements a three-layer architecture that separates concerns between backend security, client-side execution, and OpenAI’s Realtime API. This design ensures secure credential management while enabling powerful voice-driven interactions.Three-Layer Architecture
1. Backend Layer (voice-backend)
Purpose: Secure credential management and backend function execution
Key Components:
-
Client Secret Handler (
~/workspace/source/packages/voice-backend/src/index.ts:160-205)- Creates ephemeral client secrets via OpenAI’s
/v1/realtime/client_secretsendpoint - Configures session with model, voice, and instructions
- Enforces TTL between 10-7200 seconds (default 600s)
- Creates ephemeral client secrets via OpenAI’s
-
Runtime Configuration (
~/workspace/source/packages/voice-backend/src/runtime.ts:35-74)- Scans filesystem for function modules
- Supports glob patterns for discovery:
src/ai/functions-modules/... - Excludes
node_modules,dist, and hidden directories
-
Functions Registry (
~/workspace/source/packages/voice-backend/src/functions.ts:278-320)- Dynamically loads function modules
- Normalizes names (e.g.,
myFunction→my_function) - Handles exports: default functions, named exports, classes, objects
The backend never stores the OpenAI API key in client secrets. It uses the key only to request ephemeral credentials that expire automatically.
2. Client Layer (voice-frontend / voice-mobile)
Purpose: Execute frontend functions and manage voice interactions
Key Components:
-
Agent Builder (
~/workspace/source/packages/voice-frontend/src/agent.ts:47-251)- Loads local function modules via
loadNavaiFunctions() - Merges backend function definitions
- Creates tools:
navigate_to,execute_app_function, and direct function aliases - Generates dynamic instructions from routes and functions
- Loads local function modules via
-
Runtime Resolver (
~/workspace/source/packages/voice-frontend/src/runtime.ts:44-85)- Resolves route modules from
src/ai/routes.ts(default) - Discovers function modules matching configured patterns
- Supports environment overrides:
NAVAI_ROUTES_FILE,NAVAI_FUNCTIONS_FOLDERS
- Resolves route modules from
-
Route Resolution (
~/workspace/source/packages/voice-frontend/src/routes.ts:16-33)- Matches routes by path, name, or synonyms
- Normalizes Unicode for multilingual support
- Uses fuzzy matching (contains) as fallback
voice-mobile) implements the same agent logic but:
- Returns raw tool definitions instead of RealtimeAgent instance
- Provides
extractNavaiRealtimeToolCalls()for parsing OpenAI events - Uses WebRTC transport instead of WebSocket
3. OpenAI Realtime Layer
Purpose: Natural language understanding and voice synthesis Interaction Flow:- Client requests client secret from backend
- Client connects to OpenAI Realtime API with ephemeral token
- OpenAI streams audio input/output and function calls
- Client executes tool calls locally or via backend
- Results are sent back to continue conversation
Component Interactions
Client Secret Lifecycle
- Server API key stays on backend, never sent to client
- Client receives time-limited token (10s - 2h)
- Optional: Allow client-provided API keys with
allowApiKeyFromRequest: true
Function Discovery and Loading
Backend Process (runtime.ts:76-84):
functions.ts:278-320):
myFunction→my_functionMyClass.doAction→my_class_do_action- Duplicates get numbered:
my_function_2,my_function_3
Tool Execution Flow
Frontend Agent (agent.ts:108-163):
~/workspace/source/packages/voice-mobile/src/agent.ts:272-356):
Similar logic but handles tool calls from parsed OpenAI events:
Design Principles
Separation of Concerns
- Backend: Security, credentials, privileged operations
- Client: UI navigation, local state, user context
- OpenAI: NLU, voice processing, conversation management
Zero Configuration
- Automatically discovers functions from
src/ai/functions-modules/ - Default routes from
src/ai/routes.ts - Override via environment variables or options
Type Safety
- Full TypeScript support across all layers
- Exported types for all registries and definitions
- Runtime validation for function payloads
Extensibility
- Custom function loaders via glob patterns
- Pluggable transport layer (WebSocket/WebRTC)
- Custom route resolution logic
Common Patterns
Hybrid Function Distribution
Context Passing
Class-Based Functions
Performance Considerations
- Function modules loaded lazily on first request
- Registry cached after initial scan
- Tool schemas generated once at agent initialization
- Client secrets reused until expiration
Next Steps
Voice Interaction
Learn how voice flows through the system
Function Execution
Deep dive into function loading and invocation
UI Navigation
Understand voice-driven navigation
API Reference
Explore type definitions and interfaces