Skip to main content
Agentic AI uses OpenAI’s Realtime API for natural voice conversations. The voice parameter controls which voice the AI uses when speaking to callers.

Configuring Voice

Set the voice in your config.yaml file:
openai_realtime:
  enabled: true
  api_key: ${OPENAI_API_KEY}
  model: "gpt-4o-realtime-preview-2024-12-17"
  voice: "alloy"  # Change this to your preferred voice

Available Voices

OpenAI Realtime API provides six distinct voices, each with unique characteristics:

alloy

alloy
string
Neutral, balanced voiceBest for:
  • General-purpose applications
  • Professional business calls
  • When you want a neutral, non-gendered sound
Characteristics: Clear, balanced tone with good enunciation
openai_realtime:
  voice: "alloy"

echo

echo
string
Warm, conversational voiceBest for:
  • Friendly customer service
  • Casual conversations
  • Building rapport with callers
Characteristics: Approachable, warm, and engaging
openai_realtime:
  voice: "echo"

fable

fable
string
Expressive, storytelling voiceBest for:
  • Entertainment applications
  • Narrative-driven interactions
  • Creative or playful scenarios
Characteristics: Dynamic with varied intonation
openai_realtime:
  voice: "fable"

onyx

onyx
string
Deep, authoritative voiceBest for:
  • Professional announcements
  • Serious or formal contexts
  • When authority is important
Characteristics: Lower pitch, confident tone
openai_realtime:
  voice: "onyx"

nova

nova
string
Friendly, upbeat voiceBest for:
  • Customer support
  • Positive, encouraging interactions
  • Energetic applications
Characteristics: Bright, cheerful, and optimistic
openai_realtime:
  voice: "nova"

shimmer

shimmer
string
Soft, gentle voiceBest for:
  • Calming applications
  • Healthcare or wellness scenarios
  • When a soothing tone is desired
Characteristics: Smooth, calm, and reassuring
openai_realtime:
  voice: "shimmer"

Voice Comparison

Here’s a quick reference table to help you choose:
VoiceToneBest ForGender Perception
alloyNeutral, balancedGeneral purpose, professionalNeutral
echoWarm, conversationalCustomer service, friendlyMasculine
fableExpressive, dynamicStorytelling, creativeNeutral
onyxDeep, authoritativeFormal, professionalMasculine
novaFriendly, upbeatPositive interactionsFeminine
shimmerSoft, gentleCalming, wellnessFeminine

Testing Voices

To test different voices without editing your config file:
  1. Make a test call:
agenticai trigger --to +1234567890 --webhook-url https://your-tunnel.ngrok.io
  1. Edit config.yaml to try a different voice:
openai_realtime:
  voice: "nova"  # Try different voices
  1. Restart the server:
agenticai service restart
  1. Make another call to hear the difference
Voice changes require a server restart to take effect.

Dynamic Voice Selection

While the default voice is set in config.yaml, you can modify the voice programmatically if needed.

Example: Per-Call Voice Selection

If you want different voices for different types of calls, you can modify the voice in your code:
from agenticai.core.config import load_config

config = load_config()

# Override voice for specific call
if call_type == "customer_service":
    config.openai_realtime.voice = "nova"
elif call_type == "professional":
    config.openai_realtime.voice = "alloy"
See config.py:90-97 for the OpenAIRealtimeConfig structure.

Voice and Language

All OpenAI Realtime voices support multiple languages with natural accents:
  • English (US, UK, Australian, etc.)
  • Spanish
  • French
  • German
  • Italian
  • Portuguese
  • And many more
The AI will automatically match the caller’s language while maintaining the selected voice’s characteristics.
Language detection and switching happen automatically based on the conversation context.

Configuration Best Practices

Choose your voice based on your use case and audience expectations.

For Customer Service

  • Recommended: nova or echo
  • Friendly, approachable tone puts callers at ease
  • Clear pronunciation ensures understanding

For Professional Business

  • Recommended: alloy or onyx
  • Neutral, authoritative tone conveys competence
  • Appropriate for formal contexts

For Healthcare/Wellness

  • Recommended: shimmer or nova
  • Calming, reassuring tone reduces anxiety
  • Gentle delivery for sensitive topics

For Entertainment/Creative

  • Recommended: fable or nova
  • Expressive delivery engages listeners
  • Dynamic intonation maintains interest

Advanced: Voice Parameters

The OpenAI Realtime API uses the voice parameter as specified. Currently, there are no additional voice customization options like:
  • Pitch adjustment
  • Speed control
  • Volume settings
These are handled automatically by the API to ensure natural-sounding speech.
The AI automatically adjusts pacing and tone based on conversation context, regardless of the selected voice.

Gemini Voice (Legacy)

If you’re using Gemini instead of OpenAI Realtime, configure the voice in the gemini section:
gemini:
  api_key: ${GEMINI_API_KEY}
  model: "models/gemini-2.5-flash-native-audio-latest"
  voice: "Zephyr"  # Gemini voice options
Available Gemini voices:
  • Zephyr (default)
  • Coral
  • Breeze
  • Nova
Gemini voices are only used when openai_realtime.enabled: false. The default configuration uses OpenAI Realtime for better transcription accuracy.

Troubleshooting

Voice not changing

If the voice doesn’t change after updating config.yaml:
  1. Restart the server:
agenticai service restart
  1. Verify the config:
cat config.yaml | grep voice
  1. Check logs for configuration errors:
agenticai service logs -f

Invalid voice name

If you see an error about invalid voice:
  • Check spelling (case-sensitive: alloy, not Alloy)
  • Verify it’s one of the six supported voices
  • Check for extra spaces or quotes

Voice sounds wrong

If the voice doesn’t match expectations:
  • Try a different voice from the list above
  • Check that openai_realtime.enabled: true
  • Verify your OpenAI API key has Realtime API access
  • Check audio quality issues aren’t affecting perception

Build docs developers (and LLMs) love