Text-to-Speech (TTS)
Enable TTS
Master toggle for text-to-speech functionality.When enabled, AI responses are spoken aloud using the configured TTS provider.
TTS Providers
VAssist supports three TTS providers:- Kokoro (Default)
- OpenAI TTS
- OpenAI-Compatible
Local, high-quality neural TTSKokoro-JS runs entirely in your browser using WebAssembly and WebGPU. No API keys or internet required.Advantages:
- Free and private
- Natural-sounding voices
- Fast generation with WebGPU
- No API costs
- Works offline
Kokoro TTS Configuration
Set to
kokoro for local browser-based TTS.Voice Selection
Choose from 24+ high-quality voices.American Female (af_):
af_heart- Warm and friendly (default)af_alloy- Clear and professionalaf_aoede- Expressive storytelleraf_bella- Youthful and energeticaf_jessica- Confident and articulateaf_kore- Calm and soothingaf_nicole- Neutral professionalaf_nova- Bright and engagingaf_river- Smooth and naturalaf_sarah- Friendly conversationalaf_sky- Light and airy
am_adam- Deep and authoritativeam_echo- Resonant and clearam_eric- Warm and approachableam_fenrir- Strong and powerfulam_liam- Youthful and friendlyam_michael- Professional narratoram_onyx- Rich and smootham_puck- Playful and energeticam_santa- Jolly and warm
bf_alice- Refined British accentbf_emma- Clear and elegantbf_isabella- Sophisticatedbf_lily- Gentle and warm
bm_daniel- Distinguished Britishbm_fable- Narrative storytellerbm_george- Classic Britishbm_lewis- Modern British
Performance Options
Backend for TTS processing.
auto: Auto-detect (WebGPU if available, else WASM)webgpu: GPU acceleration (2-10x faster, requires compatible GPU)wasm: CPU fallback (universal compatibility)
WebGPU provides significantly faster generation but requires a compatible GPU. Auto mode automatically falls back to WASM if WebGPU is unavailable.
Speech rate multiplier.
- Range: 0.5 - 2.0
- 0.5-0.8: Slower, clearer speech
- 1.0: Normal speed (recommended)
- 1.2-2.0: Faster speech
Keep the TTS model loaded in memory.
- Enabled: Faster subsequent generations, uses ~150-300MB RAM
- Disabled: Slower but frees memory between uses
Advanced Settings
Hugging Face model identifier for Kokoro.Default model is optimized for quality and performance.
Maximum characters per TTS chunk.
- Larger chunks: Fewer API calls, more continuous speech
- Smaller chunks: Lower latency, faster first audio
Minimum characters before creating a chunk.Prevents extremely short audio segments.
OpenAI TTS Configuration
Set to
openai for OpenAI’s TTS service.Your OpenAI API key.
OpenAI TTS model.
tts-1: Standard quality, faster, lower costtts-1-hd: High-definition quality, slower, higher cost
OpenAI voice selection.Available voices:
alloy: Neutral, balancedecho: Male, clearfable: British male, expressiveonyx: Deep male voicenova: Female, friendly (default)shimmer: Female, warm
Speech rate (0.5 - 2.0).
OpenAI-Compatible TTS
Set to
openai-compatible for custom endpoints.TTS API endpoint URL.Must be compatible with OpenAI’s TTS API format.
API key for the custom endpoint (if required).
Model name or identifier for the TTS service.
Voice name supported by your TTS service.
Speech rate multiplier.
Speech-to-Text (STT)
Enable STT
Master toggle for speech-to-text functionality.When enabled, users can dictate messages using the microphone button.
STT Providers
- Chrome AI Multimodal
- OpenAI Whisper
- OpenAI-Compatible
Free, local transcriptionUses Chrome’s built-in multimodal AI for speech recognition.Requirements:
- Chrome 138+
- Multimodal Input flag enabled
- Free and private
- Works offline
- No API costs
- Fast processing
Chrome AI STT Configuration
Set to
chrome-ai-multimodal for Chrome’s built-in STT.Transcription randomness (0.0 - 2.0).Lower values are recommended for transcription accuracy.
Token selection diversity (1 - 128).
Transcription language.Supported:
en, es, jaOpenAI STT Configuration
Set to
openai for Whisper API.OpenAI API key.
Whisper model name.Currently only
whisper-1 is available via API.Input audio language (ISO 639-1 code).Examples:
en, es, fr, de, ja, zhSpecifying the language improves accuracy and reduces latency.
Transcription sampling temperature.
- 0: Deterministic, most accurate
- 0.1-0.5: Slight variability
OpenAI-Compatible STT
Set to
openai-compatible.STT API endpoint.
API key (if required).
Model name.
Transcription language.
Sampling temperature.
Recording Settings
Audio recording format.
webm: Widely supported, good compressionmp4: Alternative formatwav: Uncompressed, larger files
Maximum recording length in seconds.Prevents excessively long recordings and API timeouts.
Delay (ms) when switching audio devices.Allows hardware to stabilize before recording.
Using Voice Mode
Voice Input
- Enable STT in settings
- Click the microphone icon in the chat input
- Speak your message (up to max duration)
- Click stop or wait for silence detection
- Message is transcribed and ready to send
Voice Output
- Enable TTS in settings
- AI responses are automatically spoken
- Adjust speed in TTS settings if needed
- Change voice to match your preference
Voice Conversation Mode
Enable both TTS and STT for a fully voice-based conversation experience:
- Speak your question → STT transcribes
- AI processes and responds
- TTS reads the response aloud
- Repeat for natural voice interaction
Troubleshooting
Kokoro TTS not working
Kokoro TTS not working
Check:
- Browser supports WebAssembly
- Sufficient memory available (~300MB for model)
- No browser extensions blocking WASM
- Try switching device from
webgputowasm
OpenAI TTS/STT errors
OpenAI TTS/STT errors
Common issues:
- Invalid API key
- Insufficient credits
- Rate limiting (reduce request frequency)
- Model name typo
Chrome AI STT unavailable
Chrome AI STT unavailable
Microphone not working
Microphone not working
Steps:
- Grant microphone permission in browser
- Check system audio settings
- Test microphone in browser settings
- Ensure no other app is using the mic
- Try a different browser if persistent
Audio playback issues
Audio playback issues
Check:
- Browser audio not muted
- System volume turned up
- No headphone detection issues
- Try different audio output device
- Check browser console for errors