Skip to main content
VAssist supports voice-based interactions through text-to-speech (TTS) for AI responses and speech-to-text (STT) for voice input.

Text-to-Speech (TTS)

Enable TTS

ttsConfig.enabled
boolean
default:"false"
Master toggle for text-to-speech functionality.When enabled, AI responses are spoken aloud using the configured TTS provider.

TTS Providers

VAssist supports three TTS providers:
Local, high-quality neural TTSKokoro-JS runs entirely in your browser using WebAssembly and WebGPU. No API keys or internet required.Advantages:
  • Free and private
  • Natural-sounding voices
  • Fast generation with WebGPU
  • No API costs
  • Works offline

Kokoro TTS Configuration

ttsConfig.provider
string
default:"kokoro"
Set to kokoro for local browser-based TTS.

Voice Selection

ttsConfig.kokoro.voice
string
default:"af_heart"
Choose from 24+ high-quality voices.American Female (af_):
  • af_heart - Warm and friendly (default)
  • af_alloy - Clear and professional
  • af_aoede - Expressive storyteller
  • af_bella - Youthful and energetic
  • af_jessica - Confident and articulate
  • af_kore - Calm and soothing
  • af_nicole - Neutral professional
  • af_nova - Bright and engaging
  • af_river - Smooth and natural
  • af_sarah - Friendly conversational
  • af_sky - Light and airy
American Male (am_):
  • am_adam - Deep and authoritative
  • am_echo - Resonant and clear
  • am_eric - Warm and approachable
  • am_fenrir - Strong and powerful
  • am_liam - Youthful and friendly
  • am_michael - Professional narrator
  • am_onyx - Rich and smooth
  • am_puck - Playful and energetic
  • am_santa - Jolly and warm
British Female (bf_):
  • bf_alice - Refined British accent
  • bf_emma - Clear and elegant
  • bf_isabella - Sophisticated
  • bf_lily - Gentle and warm
British Male (bm_):
  • bm_daniel - Distinguished British
  • bm_fable - Narrative storyteller
  • bm_george - Classic British
  • bm_lewis - Modern British

Performance Options

ttsConfig.kokoro.device
string
default:"auto"
Backend for TTS processing.
  • auto: Auto-detect (WebGPU if available, else WASM)
  • webgpu: GPU acceleration (2-10x faster, requires compatible GPU)
  • wasm: CPU fallback (universal compatibility)
WebGPU provides significantly faster generation but requires a compatible GPU. Auto mode automatically falls back to WASM if WebGPU is unavailable.
ttsConfig.kokoro.speed
number
default:"1.0"
Speech rate multiplier.
  • Range: 0.5 - 2.0
  • 0.5-0.8: Slower, clearer speech
  • 1.0: Normal speed (recommended)
  • 1.2-2.0: Faster speech
ttsConfig.kokoro.keepModelLoaded
boolean
default:"true"
Keep the TTS model loaded in memory.
  • Enabled: Faster subsequent generations, uses ~150-300MB RAM
  • Disabled: Slower but frees memory between uses

Advanced Settings

ttsConfig.kokoro.modelId
string
default:"onnx-community/Kokoro-82M-v1.0-ONNX"
Hugging Face model identifier for Kokoro.Default model is optimized for quality and performance.
ttsConfig.chunkSize
number
default:"500"
Maximum characters per TTS chunk.
  • Larger chunks: Fewer API calls, more continuous speech
  • Smaller chunks: Lower latency, faster first audio
ttsConfig.minChunkSize
number
default:"100"
Minimum characters before creating a chunk.Prevents extremely short audio segments.

OpenAI TTS Configuration

ttsConfig.provider
string
Set to openai for OpenAI’s TTS service.
ttsConfig.openai.apiKey
string
required
Your OpenAI API key.
Keep your API key secure. It’s stored locally and never shared.
ttsConfig.openai.model
string
default:"tts-1"
OpenAI TTS model.
  • tts-1: Standard quality, faster, lower cost
  • tts-1-hd: High-definition quality, slower, higher cost
ttsConfig.openai.voice
string
default:"nova"
OpenAI voice selection.Available voices:
  • alloy: Neutral, balanced
  • echo: Male, clear
  • fable: British male, expressive
  • onyx: Deep male voice
  • nova: Female, friendly (default)
  • shimmer: Female, warm
ttsConfig.openai.speed
number
default:"1.0"
Speech rate (0.5 - 2.0).

OpenAI-Compatible TTS

ttsConfig.provider
string
Set to openai-compatible for custom endpoints.
ttsConfig['openai-compatible'].endpoint
string
default:"http://localhost:8000"
required
TTS API endpoint URL.Must be compatible with OpenAI’s TTS API format.
ttsConfig['openai-compatible'].apiKey
string
API key for the custom endpoint (if required).
ttsConfig['openai-compatible'].model
string
default:"tts"
Model name or identifier for the TTS service.
ttsConfig['openai-compatible'].voice
string
default:"default"
Voice name supported by your TTS service.
ttsConfig['openai-compatible'].speed
number
default:"1.0"
Speech rate multiplier.

Speech-to-Text (STT)

Enable STT

sttConfig.enabled
boolean
default:"false"
Master toggle for speech-to-text functionality.When enabled, users can dictate messages using the microphone button.

STT Providers

Free, local transcriptionUses Chrome’s built-in multimodal AI for speech recognition.Requirements:
  • Chrome 138+
  • Multimodal Input flag enabled
Advantages:
  • Free and private
  • Works offline
  • No API costs
  • Fast processing

Chrome AI STT Configuration

sttConfig.provider
string
default:"chrome-ai-multimodal"
Set to chrome-ai-multimodal for Chrome’s built-in STT.
sttConfig['chrome-ai-multimodal'].temperature
number
default:"0.1"
Transcription randomness (0.0 - 2.0).Lower values are recommended for transcription accuracy.
sttConfig['chrome-ai-multimodal'].topK
number
default:"3"
Token selection diversity (1 - 128).
sttConfig['chrome-ai-multimodal'].outputLanguage
string
default:"en"
Transcription language.Supported: en, es, ja

OpenAI STT Configuration

sttConfig.provider
string
Set to openai for Whisper API.
sttConfig.openai.apiKey
string
required
OpenAI API key.
sttConfig.openai.model
string
default:"whisper-1"
Whisper model name.Currently only whisper-1 is available via API.
sttConfig.openai.language
string
default:"en"
Input audio language (ISO 639-1 code).Examples: en, es, fr, de, ja, zh
Specifying the language improves accuracy and reduces latency.
sttConfig.openai.temperature
number
default:"0"
Transcription sampling temperature.
  • 0: Deterministic, most accurate
  • 0.1-0.5: Slight variability

OpenAI-Compatible STT

sttConfig.provider
string
Set to openai-compatible.
sttConfig['openai-compatible'].endpoint
string
default:"http://localhost:8000"
required
STT API endpoint.
sttConfig['openai-compatible'].apiKey
string
API key (if required).
sttConfig['openai-compatible'].model
string
default:"whisper"
Model name.
sttConfig['openai-compatible'].language
string
default:"en"
Transcription language.
sttConfig['openai-compatible'].temperature
number
default:"0"
Sampling temperature.

Recording Settings

sttConfig.recordingFormat
string
default:"webm"
Audio recording format.
  • webm: Widely supported, good compression
  • mp4: Alternative format
  • wav: Uncompressed, larger files
sttConfig.maxRecordingDuration
number
default:"60"
Maximum recording length in seconds.Prevents excessively long recordings and API timeouts.
sttConfig.audioDeviceSwitchDelay
number
default:"300"
Delay (ms) when switching audio devices.Allows hardware to stabilize before recording.

Using Voice Mode

Voice Input

  1. Enable STT in settings
  2. Click the microphone icon in the chat input
  3. Speak your message (up to max duration)
  4. Click stop or wait for silence detection
  5. Message is transcribed and ready to send

Voice Output

  1. Enable TTS in settings
  2. AI responses are automatically spoken
  3. Adjust speed in TTS settings if needed
  4. Change voice to match your preference

Voice Conversation Mode

Enable both TTS and STT for a fully voice-based conversation experience:
  1. Speak your question → STT transcribes
  2. AI processes and responds
  3. TTS reads the response aloud
  4. Repeat for natural voice interaction

Troubleshooting

Check:
  1. Browser supports WebAssembly
  2. Sufficient memory available (~300MB for model)
  3. No browser extensions blocking WASM
  4. Try switching device from webgpu to wasm
WebGPU issues: If GPU acceleration fails, auto mode falls back to WASM automatically.
Common issues:
  • Invalid API key
  • Insufficient credits
  • Rate limiting (reduce request frequency)
  • Model name typo
Check OpenAI dashboard for quota and usage.
Verify:
  1. Chrome 138+ installed
  2. Multimodal Input flag enabled (chrome://flags)
  3. Chrome restarted after enabling flag
  4. Microphone permission granted
Steps:
  1. Grant microphone permission in browser
  2. Check system audio settings
  3. Test microphone in browser settings
  4. Ensure no other app is using the mic
  5. Try a different browser if persistent
Check:
  1. Browser audio not muted
  2. System volume turned up
  3. No headphone detection issues
  4. Try different audio output device
  5. Check browser console for errors

Build docs developers (and LLMs) love