Voice Features

VAssist supports voice-based interactions through text-to-speech (TTS) for AI responses and speech-to-text (STT) for voice input.

Text-to-Speech (TTS)

Enable TTS

ttsConfig.enabled

boolean

default:"false"

Master toggle for text-to-speech functionality.When enabled, AI responses are spoken aloud using the configured TTS provider.

TTS Providers

VAssist supports three TTS providers:

Kokoro (Default)
OpenAI TTS
OpenAI-Compatible

Local, high-quality neural TTSKokoro-JS runs entirely in your browser using WebAssembly and WebGPU. No API keys or internet required.Advantages:

Free and private
Natural-sounding voices
Fast generation with WebGPU
No API costs
Works offline

Kokoro TTS Configuration

ttsConfig.provider

string

default:"kokoro"

Set to kokoro for local browser-based TTS.

Voice Selection

ttsConfig.kokoro.voice

string

default:"af_heart"

Choose from 24+ high-quality voices.American Female (af_):

af_heart - Warm and friendly (default)
af_alloy - Clear and professional
af_aoede - Expressive storyteller
af_bella - Youthful and energetic
af_jessica - Confident and articulate
af_kore - Calm and soothing
af_nicole - Neutral professional
af_nova - Bright and engaging
af_river - Smooth and natural
af_sarah - Friendly conversational
af_sky - Light and airy

American Male (am_):

am_adam - Deep and authoritative
am_echo - Resonant and clear
am_eric - Warm and approachable
am_fenrir - Strong and powerful
am_liam - Youthful and friendly
am_michael - Professional narrator
am_onyx - Rich and smooth
am_puck - Playful and energetic
am_santa - Jolly and warm

British Female (bf_):

bf_alice - Refined British accent
bf_emma - Clear and elegant
bf_isabella - Sophisticated
bf_lily - Gentle and warm

British Male (bm_):

bm_daniel - Distinguished British
bm_fable - Narrative storyteller
bm_george - Classic British
bm_lewis - Modern British

Performance Options

ttsConfig.kokoro.device

string

default:"auto"

Backend for TTS processing.

auto: Auto-detect (WebGPU if available, else WASM)
webgpu: GPU acceleration (2-10x faster, requires compatible GPU)
wasm: CPU fallback (universal compatibility)

WebGPU provides significantly faster generation but requires a compatible GPU. Auto mode automatically falls back to WASM if WebGPU is unavailable.

ttsConfig.kokoro.speed

number

default:"1.0"

Speech rate multiplier.

Range: 0.5 - 2.0
0.5-0.8: Slower, clearer speech
1.0: Normal speed (recommended)
1.2-2.0: Faster speech

ttsConfig.kokoro.keepModelLoaded

boolean

default:"true"

Keep the TTS model loaded in memory.

Enabled: Faster subsequent generations, uses ~150-300MB RAM
Disabled: Slower but frees memory between uses

Advanced Settings

ttsConfig.kokoro.modelId

string

default:"onnx-community/Kokoro-82M-v1.0-ONNX"

Hugging Face model identifier for Kokoro.Default model is optimized for quality and performance.

ttsConfig.chunkSize

number

default:"500"

Maximum characters per TTS chunk.

Larger chunks: Fewer API calls, more continuous speech
Smaller chunks: Lower latency, faster first audio

ttsConfig.minChunkSize

number

default:"100"

Minimum characters before creating a chunk.Prevents extremely short audio segments.

OpenAI TTS Configuration

ttsConfig.provider

string

Set to openai for OpenAI’s TTS service.

ttsConfig.openai.apiKey

string

required

Your OpenAI API key.

Keep your API key secure. It’s stored locally and never shared.

ttsConfig.openai.model

string

default:"tts-1"

OpenAI TTS model.

tts-1: Standard quality, faster, lower cost
tts-1-hd: High-definition quality, slower, higher cost

ttsConfig.openai.voice

string

default:"nova"

OpenAI voice selection.Available voices:

alloy: Neutral, balanced
echo: Male, clear
fable: British male, expressive
onyx: Deep male voice
nova: Female, friendly (default)
shimmer: Female, warm

ttsConfig.openai.speed

number

default:"1.0"

Speech rate (0.5 - 2.0).

OpenAI-Compatible TTS

ttsConfig.provider

string

Set to openai-compatible for custom endpoints.

ttsConfig['openai-compatible'].endpoint

string

default:"http://localhost:8000"

required

TTS API endpoint URL.Must be compatible with OpenAI’s TTS API format.

ttsConfig['openai-compatible'].apiKey

string

API key for the custom endpoint (if required).

ttsConfig['openai-compatible'].model

string

default:"tts"

Model name or identifier for the TTS service.

ttsConfig['openai-compatible'].voice

string

default:"default"

Voice name supported by your TTS service.

ttsConfig['openai-compatible'].speed

number

default:"1.0"

Speech rate multiplier.

Speech-to-Text (STT)

Enable STT

sttConfig.enabled

boolean

default:"false"

Master toggle for speech-to-text functionality.When enabled, users can dictate messages using the microphone button.

STT Providers

Chrome AI Multimodal
OpenAI Whisper
OpenAI-Compatible

Free, local transcriptionUses Chrome’s built-in multimodal AI for speech recognition.Requirements:

Chrome 138+
Multimodal Input flag enabled

Advantages:

Free and private
Works offline
No API costs
Fast processing

Chrome AI STT Configuration

sttConfig.provider

string

default:"chrome-ai-multimodal"

Set to chrome-ai-multimodal for Chrome’s built-in STT.

sttConfig['chrome-ai-multimodal'].temperature

number

default:"0.1"

Transcription randomness (0.0 - 2.0).Lower values are recommended for transcription accuracy.

sttConfig['chrome-ai-multimodal'].topK

number

default:"3"

Token selection diversity (1 - 128).

sttConfig['chrome-ai-multimodal'].outputLanguage

string

default:"en"

Transcription language.Supported: en, es, ja

OpenAI STT Configuration

sttConfig.provider

string

Set to openai for Whisper API.

sttConfig.openai.apiKey

string

required

OpenAI API key.

sttConfig.openai.model

string

default:"whisper-1"

Whisper model name.Currently only whisper-1 is available via API.

sttConfig.openai.language

string

default:"en"

Input audio language (ISO 639-1 code).Examples: en, es, fr, de, ja, zh

Specifying the language improves accuracy and reduces latency.

sttConfig.openai.temperature

number

default:"0"

Transcription sampling temperature.

0: Deterministic, most accurate
0.1-0.5: Slight variability

OpenAI-Compatible STT

sttConfig.provider

string

Set to openai-compatible.

sttConfig['openai-compatible'].endpoint

string

default:"http://localhost:8000"

required

STT API endpoint.

sttConfig['openai-compatible'].apiKey

string

API key (if required).

sttConfig['openai-compatible'].model

string

default:"whisper"

Model name.

sttConfig['openai-compatible'].language

string

default:"en"

Transcription language.

sttConfig['openai-compatible'].temperature

number

default:"0"

Sampling temperature.

Recording Settings

sttConfig.recordingFormat

string

default:"webm"

Audio recording format.

webm: Widely supported, good compression
mp4: Alternative format
wav: Uncompressed, larger files

sttConfig.maxRecordingDuration

number

default:"60"

Maximum recording length in seconds.Prevents excessively long recordings and API timeouts.

sttConfig.audioDeviceSwitchDelay

number

default:"300"

Delay (ms) when switching audio devices.Allows hardware to stabilize before recording.

Using Voice Mode

Voice Input

Enable STT in settings
Click the microphone icon in the chat input
Speak your message (up to max duration)
Click stop or wait for silence detection
Message is transcribed and ready to send

Voice Output

Enable TTS in settings
AI responses are automatically spoken
Adjust speed in TTS settings if needed
Change voice to match your preference

Voice Conversation Mode

Enable both TTS and STT for a fully voice-based conversation experience:

Speak your question → STT transcribes
AI processes and responds
TTS reads the response aloud
Repeat for natural voice interaction

Troubleshooting

Kokoro TTS not working

Check:

Browser supports WebAssembly
Sufficient memory available (~300MB for model)
No browser extensions blocking WASM
Try switching device from webgpu to wasm

WebGPU issues: If GPU acceleration fails, auto mode falls back to WASM automatically.

OpenAI TTS/STT errors

Common issues:

Invalid API key
Insufficient credits
Rate limiting (reduce request frequency)
Model name typo

Check OpenAI dashboard for quota and usage.

Chrome AI STT unavailable

Verify:

Chrome 138+ installed
Multimodal Input flag enabled (chrome://flags)
Chrome restarted after enabling flag
Microphone permission granted

Microphone not working

Steps:

Grant microphone permission in browser
Check system audio settings
Test microphone in browser settings
Ensure no other app is using the mic
Try a different browser if persistent

Audio playback issues

Check:

Browser audio not muted
System volume turned up
No headphone detection issues
Try different audio output device
Check browser console for errors

Get Started

Features

Guides

Text-to-Speech (TTS)

Enable TTS

TTS Providers

Kokoro TTS Configuration

Voice Selection

Performance Options

Advanced Settings

OpenAI TTS Configuration

OpenAI-Compatible TTS

Speech-to-Text (STT)

Enable STT

STT Providers

Chrome AI STT Configuration

OpenAI STT Configuration

OpenAI-Compatible STT

Recording Settings

Using Voice Mode

Voice Input

Voice Output

Voice Conversation Mode

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Features

Guides

​Text-to-Speech (TTS)

​Enable TTS

​TTS Providers

​Kokoro TTS Configuration

​Voice Selection

​Performance Options

​Advanced Settings

​OpenAI TTS Configuration

​OpenAI-Compatible TTS

​Speech-to-Text (STT)

​Enable STT

​STT Providers

​Chrome AI STT Configuration

​OpenAI STT Configuration

​OpenAI-Compatible STT

​Recording Settings

​Using Voice Mode

​Voice Input

​Voice Output

​Voice Conversation Mode

​Troubleshooting

Build docs developers (and LLMs) love

Text-to-Speech (TTS)

Enable TTS

TTS Providers

Kokoro TTS Configuration

Voice Selection

Performance Options

Advanced Settings

OpenAI TTS Configuration

OpenAI-Compatible TTS

Speech-to-Text (STT)

Enable STT

STT Providers

Chrome AI STT Configuration

OpenAI STT Configuration

OpenAI-Compatible STT

Recording Settings

Using Voice Mode

Voice Input

Voice Output

Voice Conversation Mode

Troubleshooting