| Mode | How it works | Status |
|---|---|---|
| Voice Call | STT → LLM → TTS pipeline (Twilio / Telnyx / Plivo) | Production |
| Realtime Call | True bidirectional audio via OpenAI Realtime API | Experimental |
Voice Call mode
The standard pipeline: speech from the caller is transcribed (STT), sent to the LLM, and the response is spoken back (TTS). The assistant can look up information, query APIs, and schedule appointments during the call. TTS options:- Kokoro — local, runs on your CPU, no audio leaves your machine
- OpenAI TTS — cloud, high quality
- ElevenLabs — cloud, most natural voices
Realtime Call mode
True bidirectional voice using the OpenAI Realtime API. Audio flows directly in and out — there is no intermediate transcription step. Latency is lower and the conversation feels more natural. This mode requires an OpenAI API key with Realtime API access.Setup with Twilio (most common)
Tell your assistant you want to receive calls:Create a Twilio account and get a phone number
Sign up at twilio.com. Verify your phone number. In the Twilio Console, go to Phone Numbers → Buy a Number and purchase a number with Voice capability. The number must be in E.164 format (e.g.
+15550001234).Copy your credentials
In the Twilio Console, go to Account Info and copy your Account SID (starts with
AC) and Auth Token (32 hex characters).Expose your webhook (required for inbound calls)
Inbound calls require a publicly reachable URL. GenosOS supports several tunneling options:
- Cloudflare Tunnel (recommended)
- ngrok
- Tailscale Funnel
Cloudflare Tunnel provides a fixed URL that survives restarts — ideal for production use.Tell your assistant: “Use Cloudflare Tunnel for voice calls” and provide the tunnel token and hostname.
Provide your credentials to the assistant
Set the inbound policy
| Policy | Behavior |
|---|---|
disabled | Reject all inbound calls (default) |
open | Accept calls from any number |
allowlist | Accept calls only from numbers you specify |
pairing | Unknown callers hear a pairing prompt |
Outbound calls
The assistant can initiate calls using therealtime_call tool. You can also ask for it conversationally:
| Mode | Behavior |
|---|---|
notify | Deliver the message, pause briefly, hang up. Good for reminders. |
conversation | Deliver the message, listen for a response, continue the call. |
Local TTS with Kokoro
Kokoro is an on-device text-to-speech engine included with GenosOS. When used for voice calls, no audio is sent to any cloud service — the TTS runs entirely on your machine’s CPU.realtime_call tool
The agent uses therealtime_call tool internally to manage calls. The available actions:
| Action | Description |
|---|---|
initiate_call | Start an outbound call |
continue_call | Send a follow-up message to an active call |
speak_to_user | Speak a specific phrase on an active call |
end_call | Hang up |
get_status | Check call state and transcript |
Telnyx and Plivo
GenosOS also supports Telnyx and Plivo as voice providers. Tell your assistant which you prefer:Troubleshooting
Outbound calls fail Check that the Account SID starts withAC and the Auth Token is exactly 32 hex characters. Verify the credentials in the Twilio Console.
Inbound calls do not arrive
The webhook URL must be reachable from Twilio’s servers. Verify your tunnel is running and the URL is configured in the Twilio Console under Phone Numbers → your number → Voice webhook.
No audio on calls
Media streaming requires streaming.enabled: true and an OpenAI API key for STT. Ask your assistant: “Check the voice call configuration.”
Calls drop after a few seconds
The default maxDurationSeconds is 300 (5 minutes). If calls are dropping sooner, check whether the greeting is long enough to exceed the silence timeout.