System Architecture
Highway is built on a modern, scalable architecture that integrates multiple services to deliver automated AI-powered phone verification. This page provides an in-depth look at the system components, their interactions, and the data flow during a verification call.High-Level Architecture
Highway consists of four primary components that work together to enable automated identity verification:Core Components
Frontend Dashboard (Next.js + Mantine UI)
Technology Stack:- Next.js 14 with React 18
- Mantine UI v7 component library
- TypeScript for type safety
- Supabase client for database access
- Verifications Management: Create and manage customer verification records
- Call Initiation: Trigger automated verification calls
- Real-time Monitoring: View call status and verification results
- Call Logs: Browse historical call records with collapsible details
- JSON Data Editor: Define custom verification data for each customer
-
Home Page (
src/app/page.tsx):- Displays pending verifications table
- Add verification modal with form validation
- Initiate call button for each verification
- View verification data in formatted JSON
-
Calls Page (
src/app/calls/page.tsx):- Call logs with status badges
- Collapsible call details
- Verification data display
- Status color coding (successful, unsuccessful, in progress, etc.)
src/utils/api.ts:
Backend Server (Express.js + WebSocket)
Technology Stack:- Express.js web framework
- express-ws for WebSocket support
- Twilio SDK for phone call management
- OpenAI SDK for Realtime API access
- Supabase client for database operations
- Winston for structured logging
-
Main Server (
index.js):- Initializes Express app with WebSocket support
- Configures CORS for frontend communication
- Sets up route handlers and WebSocket endpoints
- Starts HTTP server on configured port
-
Routes (
routes.js):GET /: Health check endpointPOST /call-customer: Initiates Twilio call with TwiML- Creates call record in Supabase
- Returns call SID for tracking
-
WebSocket Handler (
websocket.js):- Endpoint:
/media-stream/:id/:numid - Manages bidirectional audio streaming
- Connects to OpenAI Realtime API
- Handles media events from Twilio
- Processes OpenAI responses and function calls
- Updates call status in database
- Endpoint:
-
Configuration (
config.js):- Environment variable management
- OpenAI API key and model settings
- Twilio credentials
- System message and voice configuration
- Event logging preferences
-
Conversation Config (
conversationConfig.js):- Session configuration for OpenAI Realtime API
- Voice activity detection (VAD) settings
- Audio format configuration (g711_ulaw)
- AI instructions and behavior
- Function definitions (hang_up_call, call_reflection_data)
WebSocket Server
The WebSocket server handles real-time audio streaming between Twilio and OpenAI: Connection Flow:- Client (Twilio) connects to
/media-stream/:id/:numid - Backend establishes connection to OpenAI Realtime API
- Backend fetches verification data from Supabase
- Session configuration sent to OpenAI
- AI receives system prompt with verification data
- Bidirectional audio streaming begins
-
From Twilio:
start: Stream initialization with streamSidmedia: Audio payload in base64 (g711_ulaw)- Forwarded to OpenAI as
input_audio_buffer.append
-
From OpenAI:
response.audio.delta: AI voice response audioresponse.function_call_arguments.done: Function execution resultssession.updated: Configuration confirmation- Audio deltas forwarded back to Twilio
-
hang_up_call: Ends the call gracefully
- Called when verification is complete
- Customer explicitly requests to hang up
-
call_reflection_data: Updates call status
- Parameters: status (successful_call, unsuccessful_call, etc.)
- Updates Supabase calls table
External Integrations
Twilio Voice Integration
Purpose: Handles phone call infrastructure and audio streaming Call Initiation:<Connect>: Establishes WebSocket connection<Stream>: Streams audio to/from backend WebSocket<Record>: Optional call recording with transcription<Hangup>: Ends call after stream closes
- Format: G.711 μ-law (g711_ulaw)
- Sample Rate: 8 kHz
- Encoding: Base64
- Compatible with OpenAI Realtime API
OpenAI Realtime API
Purpose: Powers the AI voice conversation engine Model: GPT-4o Realtime Preview (2024-10-01) Connection:- Real-time voice-to-voice conversation
- Server-side voice activity detection (VAD)
- Function calling for programmatic actions
- Natural language understanding
- Context-aware questioning
Supabase Database
Purpose: Persistent storage for verifications and call records Database Schema: verifications table:in_progress: Call is currently activesuccessful_call: Identity verified successfullyunsuccessful_call: Identity not verifieduser_hung_up: Customer ended call prematurelysystem_error: Technical error occurred
Data Flow: Verification Call Lifecycle
Let’s walk through the complete flow of a verification call from initiation to completion:Phase 1: Verification Creation
User Creates Verification
User fills out the verification form in the frontend dashboard:
- Name: “John Doe”
- Phone: “5551234567”
- Background: “customer signed up for a loan”
- Verification Data:
{"date of birth": "1990-01-01", "address": "123 Main St"}
Phase 2: Call Initiation
Phase 3: WebSocket Connection Establishment
Twilio Connects to Backend WebSocket
When customer answers, Twilio opens WebSocket connection:Parameters:
12345: Verification ID67890: Call ID
Phase 4: AI Session Configuration
Send Session Update to OpenAI
Backend configures OpenAI session with:
- Voice: “shimmer”
- Audio format: g711_ulaw
- VAD threshold: 0.95
- Temperature: 0.6
- Available functions: hang_up_call, call_reflection_data
Phase 5: Real-time Audio Streaming
Customer Speaks
Audio flow:
- Customer speaks into phone
- Twilio captures audio (g711_ulaw)
- Twilio sends WebSocket message to backend:
- Backend forwards to OpenAI:
AI Processes and Responds
OpenAI flow:
- Receives customer audio via WebSocket
- VAD detects speech completion
- Processes audio with GPT-4o Realtime model
- Generates appropriate response
- Converts response to audio
- Sends audio delta chunks:
Phase 6: Verification Questions
AI Asks First Question
AI: “Hi John, this is an automated call from Olive Financial. We need to verify
some information for your loan application. Can you please confirm your date of birth?”
Phase 7: Call Completion
AI Determines Outcome
Based on customer responses, AI determines verification status:
- Successful: Answers match verification data
- Unsuccessful: Answers don’t match or customer refuses
Phase 8: Results Review
Technology Stack Summary
Frontend
- Framework: Next.js 14
- UI Library: Mantine v7.13
- Language: TypeScript
- State Management: React hooks (useState, useEffect)
- Form Handling: @mantine/form
- Database Client: @supabase/supabase-js
- Icons: @tabler/icons-react
Backend
- Runtime: Node.js
- Framework: Express.js 4.21
- WebSocket: express-ws 5.0
- Language: JavaScript
- Validation: Joi 17.13
- Logging: Winston 3.15
- Database Client: @supabase/supabase-js 2.45
- Phone Service: Twilio SDK 5.3
- AI Service: OpenAI SDK 4.67
Infrastructure
- Database: Supabase (PostgreSQL)
- Voice Gateway: Twilio Voice API
- AI Engine: OpenAI Realtime API (GPT-4o)
- Audio Protocol: WebSocket
- Audio Format: G.711 μ-law
Security Considerations
Key Security Measures:- Environment Variables: Sensitive data in
.envfiles - CORS Configuration: Restrict frontend origins in production
- Supabase RLS: Implement Row Level Security policies
- HTTPS/WSS: Use encrypted connections in production
- API Rate Limiting: Implement rate limiting on backend endpoints
- Input Validation: Validate all user inputs with Joi schemas
- Webhook Authentication: Verify Twilio webhook signatures
Scalability Considerations
Current Architecture Limitations:- Single backend server handles all WebSocket connections
- No horizontal scaling for WebSocket connections
- Database queries not optimized for high volume
- Load Balancing: Deploy multiple backend instances with sticky sessions
- Redis: Add Redis for session management and caching
- Message Queue: Use RabbitMQ or AWS SQS for async processing
- Connection Pooling: Implement Supabase connection pooling
- CDN: Serve frontend static assets via CDN
- Monitoring: Add Datadog, New Relic, or custom metrics
- Auto-scaling: Configure cloud auto-scaling based on CPU/memory
Next Steps
Configuration Guide
Learn how to customize AI behavior, voice settings, and conversation flow
API Reference
Explore detailed API endpoint documentation and WebSocket event schemas
Deployment Guide
Deploy Highway to production with AWS, Google Cloud, or other providers
Setup & Configuration
Setup guides for backend, frontend, and environment configuration