Text-to-Speech (TTS)
SimpleClaw supports multiple TTS providers with automatic fallback and per-message voice customization.Supported Providers
- Edge TTS - Free, built-in Microsoft Edge voices (default)
- OpenAI TTS - High-quality voices including gpt-4o-mini-tts, tts-1, tts-1-hd
- ElevenLabs - Premium voices with fine-grained control
Configuration
Auto Modes
off- TTS disabledalways- Convert all responses to speechinbound- Only respond with voice when user sends voicetagged- Only when message contains[[tts]]directives
Voice Directives
Control TTS behavior inline using[[tts:...]] tags:
provider=openai|elevenlabs|edge- Switch providervoice=alloy- Change OpenAI voicevoiceid=<id>- Change ElevenLabs voicestability=0.7- ElevenLabs stability (0-1)speed=1.2- ElevenLabs speed (0.5-2)model=tts-1-hd- Override model
Text Summarization
Long responses are automatically summarized before TTS conversion:~/.simpleclaw/settings/tts.json:
Voice Wake Mode
Voice wake mode allows hands-free activation using your system’s speech recognition.How It Works
- System listens for wake phrase (e.g., “Hey SimpleClaw”)
- Transcribes your command using macOS dictation or other STT
- Sends command to SimpleClaw agent
- Returns audio response via TTS
Platform Support
- macOS - Uses built-in dictation and Speech framework
- Linux/Windows - Custom integration required
Example Use Cases
- “Hey SimpleClaw, what’s on my calendar?”
- “Hey SimpleClaw, summarize my unread emails”
- “Hey SimpleClaw, set a reminder for 3pm”
Implementation Details
Voice wake forwarding uses the SimpleClaw CLI:- Captures speech via system API (src/tts/)
- Shells out to SimpleClaw CLI with transcribed text
- Agent processes request and returns response
- Response is converted to audio via TTS
- Audio plays through system speakers
Provider Fallback
TTS providers are tried in order with automatic fallback:Audio Formats
Output format varies by channel:- Default - MP3 (44.1kHz, 128kbps)
- Telegram - Opus (48kHz, 64kbps) for voice notes
- Telephony - PCM (22-24kHz) for call integrations
Custom OpenAI Endpoints
Support for custom TTS endpoints (e.g., Kokoro, LocalAI):API Reference
Key functions fromsrc/tts/tts.ts:
textToSpeech()- Convert text to audio file (src/tts/tts.ts:532)textToSpeechTelephony()- PCM audio for telephony (src/tts/tts.ts:702)maybeApplyTtsToPayload()- Auto-apply TTS to response (src/tts/tts.ts:791)buildTtsSystemPromptHint()- Add TTS guidance to system prompt (src/tts/tts.ts:350)
Troubleshooting
No audio output? Check TTS status:~/.simpleclaw/settings/tts.json.