Voice calls

GenosOS supports two voice call modes. Both handle real phone calls — they differ in how the audio pipeline works.

Mode	How it works	Status
Voice Call	STT → LLM → TTS pipeline (Twilio / Telnyx / Plivo)	Production
Realtime Call	True bidirectional audio via OpenAI Realtime API	Experimental

Voice Call mode

The standard pipeline: speech from the caller is transcribed (STT), sent to the LLM, and the response is spoken back (TTS). The assistant can look up information, query APIs, and schedule appointments during the call. TTS options:

Kokoro — local, runs on your CPU, no audio leaves your machine
OpenAI TTS — cloud, high quality
ElevenLabs — cloud, most natural voices

Realtime Call mode

True bidirectional voice using the OpenAI Realtime API. Audio flows directly in and out — there is no intermediate transcription step. Latency is lower and the conversation feels more natural. This mode requires an OpenAI API key with Realtime API access.

Setup with Twilio (most common)

Tell your assistant you want to receive calls:

You: "I want to receive phone calls"

The agent asks for your provider preference and guides you through the rest.

Create a Twilio account and get a phone number

Sign up at twilio.com. Verify your phone number. In the Twilio Console, go to Phone Numbers → Buy a Number and purchase a number with Voice capability. The number must be in E.164 format (e.g. +15550001234).

Copy your credentials

In the Twilio Console, go to Account Info and copy your Account SID (starts with AC) and Auth Token (32 hex characters).

Expose your webhook (required for inbound calls)

Inbound calls require a publicly reachable URL. GenosOS supports several tunneling options:

Cloudflare Tunnel (recommended)
ngrok
Tailscale Funnel

Cloudflare Tunnel provides a fixed URL that survives restarts — ideal for production use.

# Install cloudflared
brew install cloudflared    # macOS
# or download from developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/

# Create a named tunnel (one-time setup)
cloudflared tunnel create genos-voice

Tell your assistant: “Use Cloudflare Tunnel for voice calls” and provide the tunnel token and hostname.

# Install ngrok from ngrok.com, then:
ngrok http 3334

Tell your assistant: “Use ngrok for voice calls” and provide your ngrok auth token.

ngrok free tier URLs change on restart. For stable inbound calling, use a paid ngrok domain or switch to Cloudflare Tunnel.

If you use Tailscale, Funnel provides a stable public URL without a third-party account:

tailscale funnel 3334

Tell your assistant: “Use Tailscale Funnel for voice calls.”

Provide your credentials to the assistant

You: "Twilio Account SID is ACxxxxxxxx, Auth Token is your_token, my number is +15550001234"

The agent stores the credentials in the encrypted vault and configures the voice-call plugin.

Set the inbound policy

You: "Accept calls from anyone"
You: "Only accept calls from +15559876543"
You: "Set a greeting: Hello, how can I help you?"

Inbound policies:

Policy	Behavior
`disabled`	Reject all inbound calls (default)
`open`	Accept calls from any number
`allowlist`	Accept calls only from numbers you specify
`pairing`	Unknown callers hear a pairing prompt

Restart the gateway

Voice call changes require a gateway restart to take effect. The agent will remind you.

Outbound calls

The assistant can initiate calls using the realtime_call tool. You can also ask for it conversationally:

You: "Call +15559876543 and remind them about tomorrow's appointment"
You: "Call +15559876543 and have a conversation about their order"

Two outbound modes:

Mode	Behavior
`notify`	Deliver the message, pause briefly, hang up. Good for reminders.
`conversation`	Deliver the message, listen for a response, continue the call.

Local TTS with Kokoro

Kokoro is an on-device text-to-speech engine included with GenosOS. When used for voice calls, no audio is sent to any cloud service — the TTS runs entirely on your machine’s CPU.

You: "Use Kokoro for voice call speech"

Kokoro is slower than cloud TTS on low-powered hardware but produces good quality output and keeps all audio local.

realtime_call tool

The agent uses the realtime_call tool internally to manage calls. The available actions:

Action	Description
`initiate_call`	Start an outbound call
`continue_call`	Send a follow-up message to an active call
`speak_to_user`	Speak a specific phrase on an active call
`end_call`	Hang up
`get_status`	Check call state and transcript

Telnyx and Plivo

GenosOS also supports Telnyx and Plivo as voice providers. Tell your assistant which you prefer:

You: "I want to use Telnyx for calls"
You: "I want to use Plivo for calls"

The setup flow is the same — the agent asks for the provider-specific credentials.

Troubleshooting

Outbound calls fail Check that the Account SID starts with AC and the Auth Token is exactly 32 hex characters. Verify the credentials in the Twilio Console. Inbound calls do not arrive The webhook URL must be reachable from Twilio’s servers. Verify your tunnel is running and the URL is configured in the Twilio Console under Phone Numbers → your number → Voice webhook. No audio on calls Media streaming requires streaming.enabled: true and an OpenAI API key for STT. Ask your assistant: “Check the voice call configuration.” Calls drop after a few seconds The default maxDurationSeconds is 300 (5 minutes). If calls are dropping sooner, check whether the greeting is long enough to exceed the silence timeout.

Get Started

Channels

AI Providers

Agents & Skills

Security

Voice Call mode

Realtime Call mode

Setup with Twilio (most common)

Outbound calls

Local TTS with Kokoro

realtime_call tool

Telnyx and Plivo

Troubleshooting

Get Started

Channels

AI Providers

Agents & Skills

Security

​Voice Call mode

​Realtime Call mode

​Setup with Twilio (most common)

​Outbound calls

​Local TTS with Kokoro

​realtime_call tool

​Telnyx and Plivo

​Troubleshooting

Voice Call mode

Realtime Call mode

Setup with Twilio (most common)

Outbound calls

Local TTS with Kokoro

realtime_call tool

Telnyx and Plivo

Troubleshooting