Skip to main content
SlasshyWispr includes built-in text-to-speech (TTS) for reading assistant responses aloud. Choose between Piper (fast, zero-Python) and Coqui (high-quality, requires Python).

TTS Engine Selection

ttsEngine
TtsEngine
default:"piper"
Text-to-speech engine
  • piper - Fast, lightweight, zero-dependency TTS (default)
  • coqui - High-quality neural TTS with voice cloning (requires Python setup)
Coqui is disabled in zero-Python mode. If ZERO_PYTHON_MODE = true, only Piper is available.

Piper Configuration

Piper is the default TTS engine: lightweight, fast, and requires no external dependencies.
piperPath
string
default:""
Path to Piper executableSlasshyWispr automatically installs and configures Piper on first run. This setting is managed internally.
piperSpeed
number
default:"1.08"
Piper speech rate multiplier
  • Range: 0.5 to 2.0
  • 1.0 = normal speed
  • 1.08 = slightly faster (default)
  • 1.5 = 50% faster
  • 0.8 = 20% slower
piperQuality
PiperQuality
default:"fast"
Piper voice quality preset
  • fast - Fastest inference, good quality (default)
  • balanced - Balanced speed and quality
  • high - Best quality, slower inference
piperEmotion
PiperEmotion
default:"neutral"
Piper emotional tone
  • neutral - Standard neutral voice (default)
  • calm - Calming, relaxed tone
  • happy - Upbeat, cheerful tone
  • excited - Energetic, enthusiastic tone
  • serious - Professional, formal tone
  • sad - Somber, low-energy tone
Emotion support depends on the installed voice model. Not all emotions may be available.

Coqui Configuration

Coqui TTS provides studio-quality neural voices with voice cloning capabilities.
Coqui requires Python and additional setup. It is disabled when ZERO_PYTHON_MODE = true.
coquiPythonPath
string
default:""
Path to Python executable for CoquiMust point to a Python installation with TTS library installed:
pip install TTS
coquiModelName
string
default:"tts_models/multilingual/multi-dataset/xtts_v2"
Coqui TTS model identifierDefault model is XTTS v2, a multilingual neural TTS model.Other options:
  • tts_models/en/ljspeech/tacotron2-DDC
  • tts_models/en/vctk/vits
  • See Coqui model list: tts --list_models
coquiLanguage
string
default:"en"
Language code for Coqui TTSSupported languages depend on the model. XTTS v2 supports:
  • en - English (default)
  • es - Spanish
  • fr - French
  • de - German
  • it - Italian
  • pt - Portuguese
  • zh - Chinese
  • And more…
coquiVoiceId
string
default:""
Voice speaker ID or cloned voice file
  • For built-in voices: speaker ID (e.g., p225, p226)
  • For cloned voices: path to reference audio file
Use the voice cloning feature to create custom voices from audio samples.
coquiSpeed
number
default:"1.0"
Coqui speech rate multiplier
  • Range: 0.5 to 2.0
  • 1.0 = normal speed (default)
  • Values work the same as Piper speed
coquiQuality
CoquiQuality
default:"balanced"
Coqui voice quality preset
  • fast - Faster inference, good quality
  • balanced - Balanced speed and quality (default)
  • high - Best quality, slower inference
coquiEmotion
CoquiEmotion
default:"neutral"
Coqui emotional toneSame options as Piper: neutral, calm, happy, excited, serious, sad
coquiUseGpu
boolean
default:"false"
Enable GPU acceleration for CoquiRequires CUDA-compatible GPU and PyTorch with CUDA support.Benefits:
  • 5-10x faster inference
  • Lower CPU usage
  • Enables real-time TTS for long responses
coquiSplitSentences
boolean
default:"false"
Split text into sentences before synthesisWhen enabled, long responses are broken into sentences and synthesized separately for more natural pacing.

Voice Installation

Piper voices are installed automatically:
  1. Open Settings > TTS
  2. Select Piper as TTS engine
  3. Click “Install Voice” (if not already installed)
  4. SlasshyWispr downloads and configures the default voice
  5. Test with “Preview Voice” button
Voice models are stored locally. No internet connection needed after installation.

Voice Cloning (Coqui Only)

Create custom voices from reference audio samples.

Requirements

  • Coqui TTS engine enabled
  • XTTS v2 or compatible cloning model
  • Clean audio sample (5-30 seconds recommended)
  • Single speaker, minimal background noise

Cloning Process

  1. Open Settings > TTS > Coqui
  2. Click “Clone Voice”
  3. Upload or record reference audio (max 30 seconds)
  4. Provide a speaker ID name
  5. SlasshyWispr processes the audio and creates a voice profile
  6. Test with “Preview Voice” button
  7. Select the cloned voice from the voice dropdown
Maximum reference audio length: seconds (from constants.ts:87)

Best Practices for Voice Cloning

  • Audio quality: Use high-quality recordings (44.1kHz or higher)
  • Duration: 10-20 seconds is optimal
  • Content: Natural speech, varied intonation
  • Environment: Quiet room, no echo or reverb
  • Speaker: Single speaker only, consistent volume
  • Emotion: Neutral tone for most versatile results

Assistant Name

assistantName
string
default:"Lily"
Display name for the assistant in conversationsThis appears in the home history and TTS announcements.

Troubleshooting

  1. Check system volume and output device
  2. Verify TTS engine is properly installed
  3. Test with “Preview Voice” in settings
  4. Check SlasshyWispr audio permissions
  5. Try switching to the other TTS engine
  1. Verify Python version (3.8 or higher required)
  2. Install TTS library: pip install TTS
  3. Check Python path in settings
  4. Look for error messages in SlasshyWispr logs
  5. Try installing in a virtual environment
  1. Increase quality setting (Piper/Coqui quality)
  2. Reduce speed multiplier
  3. For Coqui: enable GPU acceleration if available
  4. For cloning: use higher quality reference audio
  5. Try a different voice model
  1. Lower quality setting to “fast”
  2. For Coqui: enable GPU acceleration
  3. Increase speed multiplier
  4. Switch to Piper for fastest inference
  5. Disable sentence splitting (Coqui)

Build docs developers (and LLMs) love