Configuration
Voxtype uses a TOML configuration file to customize behavior. This guide covers all major configuration options.
Config File Location
Voxtype looks for configuration in this order:
Path specified via -c / --config flag
~/.config/voxtype/config.toml (default)
/etc/voxtype/config.toml (system-wide)
Built-in defaults
Creating a Config File
# Create config directory
mkdir -p ~/.config/voxtype
# View current config with defaults
voxtype config > ~/.config/voxtype/config.toml
Or copy the example:
cp config/default.toml ~/.config/voxtype/config.toml
Configuration Structure
The config file is organized into sections:
[hotkey] - Key bindings and activation mode
[audio] - Microphone and recording settings
[whisper] - Transcription model and language
[output] - Text output mode and drivers
[text] - Word replacements and spoken punctuation
[meeting] - Meeting transcription settings
[vad] - Voice Activity Detection
[status] - Status bar integration icons
[profiles] - Named configurations for different contexts
State File
Controls where the daemon writes its current state for external integrations (Waybar, scripts).
# Default: auto-detect location ($XDG_RUNTIME_DIR/voxtype/state)
state_file = "auto"
# Or specify a path
state_file = "/tmp/voxtype-state"
# Or disable
state_file = "disabled"
The state file contains: "idle", "recording", "transcribing", or "outputting".
Required for voxtype record toggle and voxtype status commands.
[hotkey] - Key Bindings
Controls the built-in evdev hotkey detection.
Basic Configuration
[ hotkey ]
# Enable built-in hotkey (false if using compositor keybindings)
enabled = true
# Main hotkey (evdev key name)
key = "SCROLLLOCK"
# Optional modifier keys
modifiers = [] # e.g., ["LEFTCTRL", "LEFTALT"]
# Activation mode: "push_to_talk" or "toggle"
mode = "push_to_talk"
Available Keys
Common hotkeys:
SCROLLLOCK - Scroll Lock key (default)
PAUSE - Pause/Break key
RIGHTALT - Right Alt key
F13 through F24 - Extended function keys
MEDIA - Media key
RECORD - Record key
INSERT, HOME, END, PAGEUP, PAGEDOWN, DELETE
Use evtest to find key names: sudo evtest
# Select keyboard, press key, note KEY_XXXXX name (without KEY_ prefix)
Numeric Keycodes
If your key isn’t in the built-in list, specify by keycode:
[ hotkey ]
key = "WEV_234" # XKB keycode from wev/xev
key = "EVTEST_226" # Kernel keycode from evtest
key = "WEV_0xEA" # Hex also works
Prefixes:
WEV_, X11_, XEV_ - XKB keycode (offset by 8 from kernel)
EVTEST_ - Kernel keycode
Cancel Key
Abort recording or transcription without outputting text:
[ hotkey ]
cancel_key = "ESC" # Press Escape to cancel
Modifier Key for Secondary Model
Use a different model when holding a modifier:
[ hotkey ]
model_modifier = "LEFTSHIFT" # Shift + hotkey uses secondary model
[ whisper ]
model = "base.en"
secondary_model = "large-v3-turbo"
[audio] - Audio Settings
[ audio ]
# Audio input device ("default" or device name from pactl)
device = "default"
# Sample rate in Hz (whisper expects 16000)
sample_rate = 16000
# Maximum recording duration in seconds (safety limit)
max_duration_secs = 60
Finding Device Names
Example:
[ audio ]
device = "alsa_input.usb-Blue_Microphones_Yeti-00.analog-stereo"
Audio Feedback
Sound cues when recording starts/stops:
[ audio . feedback ]
enabled = true
theme = "default" # "default", "subtle", "mechanical", or path
volume = 0.7 # 0.0 to 1.0
Built-in themes:
default - Clear two-tone beeps
subtle - Quiet clicks
mechanical - Typewriter sounds
Custom theme:
[ audio . feedback ]
enabled = true
theme = "/home/user/.config/voxtype/sounds"
volume = 0.8
Custom theme directory must contain: start.wav, stop.wav, error.wav
[whisper] - Transcription
Basic Configuration
[ whisper ]
# Execution mode: "local", "remote", or "cli"
mode = "local"
# Model to use
model = "base.en"
# Language: "en", "auto", or ["en", "fr"] for constrained detection
language = "en"
# Translate non-English speech to English
translate = false
# Number of CPU threads (omit for auto-detect)
# threads = 4
Available Models
Model Size Speed Accuracy Languages tiny.en39 MB Fastest Good English only base.en142 MB Fast Better English only small.en466 MB Medium Great English only medium.en1.5 GB Slow Excellent English only large-v33.1 GB Slowest Best 99 languages large-v3-turbo1.6 GB Fast Excellent 99 languages
.en models are English-only but faster and more accurate for English. large-v3-turbo is recommended for GPU users.
Language Configuration
Three modes:
1. Single language (fastest, most accurate):
[ whisper ]
language = "en" # or "fr", "es", "de", "ja", "zh", etc.
2. Auto-detect from all languages:
[ whisper ]
language = "auto"
model = "large-v3" # Multilingual model required
3. Constrained auto-detect (recommended for multilingual users):
[ whisper ]
language = [ "en" , "fr" ] # Auto-detect between English and French
model = "large-v3"
Constrained detection is more accurate for short sentences where Whisper might misdetect the language.
Multi-Model Configuration
Configure multiple models:
[ whisper ]
model = "base.en" # Primary model
# Secondary model (used with hotkey.model_modifier or CLI --model)
secondary_model = "large-v3-turbo"
# Additional available models
available_models = [ "medium.en" ]
# Model caching (only applies when gpu_isolation = false)
max_loaded_models = 2 # LRU eviction
cold_model_timeout_secs = 300 # 5 minutes before unloading idle models
Use via CLI:
voxtype record start --model large-v3-turbo
On-Demand Loading
Load model only when recording starts:
[ whisper ]
on_demand_loading = true
Trade-off:
Pros: Saves memory/VRAM when idle
Cons: Slight delay at start of first recording
GPU Memory Isolation
Run transcription in a subprocess that exits after completion:
[ whisper ]
gpu_isolation = true
When to use:
Laptops with hybrid graphics (allows dGPU to sleep)
Limited VRAM (releases GPU memory between recordings)
Trade-off: Slightly slower per recording (model load/unload overhead)
Context Window Optimization
Use smaller context window for short recordings:
[ whisper ]
context_window_optimization = false # Disabled by default
Some models (especially large-v3 and large-v3-turbo) may experience repetition loops with this enabled. Only enable if you experience faster transcription without issues.
Initial Prompt
Provide context to improve accuracy:
[ whisper ]
initial_prompt = "Technical discussion about Rust, TypeScript, and Kubernetes."
Use for:
Domain-specific terminology
Proper nouns (names, products)
Formatting conventions
Remote Whisper Server
Send audio to a remote server:
[ whisper ]
mode = "remote"
remote_endpoint = "http://192.168.1.100:8080" # whisper.cpp server
remote_model = "whisper-1" # Model name to request
remote_timeout_secs = 30
# Optional: API key (or use VOXTYPE_WHISPER_API_KEY env var)
# remote_api_key = "sk-..."
OpenAI API:
[ whisper ]
mode = "remote"
remote_endpoint = "https://api.openai.com"
remote_model = "whisper-1"
remote_api_key = "sk-proj-..."
CLI Backend (whisper-cli)
Use whisper-cli subprocess instead of FFI:
[ whisper ]
mode = "cli"
whisper_cli_path = "/usr/local/bin/whisper-cli" # Optional
Fallback for systems where whisper-rs crashes (e.g., glibc 2.42+).
[output] - Text Output
Output Mode
[ output ]
# Primary output mode: "type", "clipboard", "paste", or "file"
mode = "type"
# Fall back to clipboard if typing fails
fallback_to_clipboard = true
Modes:
type - Simulates keyboard input at cursor (wtype → dotool → ydotool → clipboard)
clipboard - Copies text to clipboard (wl-clipboard)
paste - Copies to clipboard then simulates Ctrl+V
file - Writes to a file
See the Output Modes guide for details.
Driver Order
Customize the fallback chain for type mode:
[ output ]
driver_order = [ "wtype" , "dotool" , "ydotool" , "clipboard" ]
Available drivers:
wtype - Wayland virtual keyboard (best Unicode/CJK support, no daemon)
dotool - uinput with keyboard layout support (no daemon)
ydotool - uinput fallback (requires ydotoold daemon)
clipboard - wl-clipboard (universal fallback)
Examples:
# Prefer ydotool over dotool
driver_order = [ "wtype" , "ydotool" , "dotool" , "clipboard" ]
# Use only ydotool (no fallback)
driver_order = [ "ydotool" ]
Typing Delays
[ output ]
# Delay before typing starts (ms)
pre_type_delay_ms = 0
# Delay between each character (ms)
type_delay_ms = 0
Increase if characters are dropped or typed out of order.
Auto-Submit
Automatically press Enter after output:
[ output ]
auto_submit = true
Useful for chat apps, terminals, or forms.
Shift+Enter Newlines
Convert newlines to Shift+Enter instead of Enter:
[ output ]
shift_enter_newlines = true
Useful for apps where Enter submits (Slack, Discord, Cursor IDE).
Paste Mode Settings
[ output ]
mode = "paste"
# Keystroke for paste (default: "ctrl+v")
paste_keys = "ctrl+v" # or "shift+insert", "ctrl+shift+v"
# Restore clipboard after paste
restore_clipboard = false
restore_clipboard_delay_ms = 200
File Output
[ output ]
mode = "file"
file_path = "/tmp/voxtype-output.txt"
file_mode = "overwrite" # or "append"
Pre/Post Output Hooks
Run commands before/after typing:
[ output ]
# Run before typing starts
pre_output_command = "hyprctl dispatch submap voxtype_suppress"
# Run after typing completes
post_output_command = "hyprctl dispatch submap reset"
Used for compositor integration to block modifier keys during typing.
For non-US keyboard layouts:
[ output ]
dotool_xkb_layout = "de" # German
dotool_xkb_variant = "nodeadkeys" # Optional
See the Output Modes guide for more details.
Notifications
[ output . notification ]
on_recording_start = false # Notify when recording starts
on_recording_stop = false # Notify when transcription begins
on_transcription = true # Show transcribed text
Post-Processing Command
Pipe transcriptions through an external command:
[ output . post_process ]
command = "ollama run llama3.2:1b 'Clean up this dictation. Fix grammar, remove filler words:'"
timeout_ms = 30000 # 30 seconds
See the Text Processing guide for examples.
[text] - Text Processing
Word Replacements
Fix commonly misheard words:
[ text ]
replacements = { " vox type " = "voxtype" , " oh marky " = "Omarchy" }
Case-insensitive matching.
Spoken Punctuation
Convert spoken words to symbols:
[ text ]
spoken_punctuation = true
Examples:
“period” → .
“comma” → ,
“open paren” → (
“close paren” → )
“new line” → \n
See the Text Processing guide for the full list.
[meeting] - Meeting Transcription
Continuous transcription for meetings:
[ meeting ]
enabled = false
chunk_duration_secs = 30
storage_path = "auto" # ~/.local/share/voxtype/meetings/
retain_audio = false
max_duration_mins = 180 # 3 hours
[ meeting . audio ]
mic_device = "default"
loopback_device = "auto" # Capture remote participants
echo_cancel = "auto" # Remove speaker bleed-through
[ meeting . diarization ]
enabled = true
backend = "simple" # "simple", "ml", or "remote"
max_speakers = 10
[ meeting . summary ]
backend = "disabled" # "local", "remote", or "disabled"
ollama_url = "http://localhost:11434"
ollama_model = "llama3.2"
timeout_secs = 120
See the Meeting Mode documentation for details.
[vad] - Voice Activity Detection
Filter silence-only recordings:
[ vad ]
enabled = false # Enable VAD
backend = "auto" # "auto", "energy", or "whisper"
threshold = 0.5 # 0.0 (sensitive) to 1.0 (aggressive)
min_speech_duration_ms = 100
Backends:
auto - Whisper VAD for Whisper engine, Energy VAD for ONNX engines
energy - RMS-based detection (no model needed)
whisper - Silero VAD via whisper-rs (requires model download)
[status] - Status Display Icons
Customize icons for Waybar/tray integrations:
[ status ]
icon_theme = "emoji" # or "nerd-font", "material", "minimal", "text", etc.
# Per-state overrides (optional)
[ status . icons ]
idle = "🎙️"
recording = "🎤"
transcribing = "⏳"
stopped = ""
Built-in themes:
Theme Requires Font Icons emojiNo 🎙️ 🎤 ⏳ minimalNo ○ ● ◐ × dotsNo ◯ ⬤ ◔ ◌ arrowsNo ▶ ● ↻ ■ textNo [MIC] [REC] […] [OFF] nerd-fontNerd Font materialMDI c a e d phosphorPhosphor codiconsCodicons omarchyNerd Font c f
[profiles] - Named Profiles
Context-specific configurations:
[ profiles . slack ]
post_process_command = "ollama run llama3.2:1b 'Format for Slack...'"
[ profiles . code ]
post_process_command = "ollama run llama3.2:1b 'Format as code comment...'"
output_mode = "clipboard"
Use with:
voxtype record start --profile slack
Environment Variable Overrides
All config options can be overridden via VOXTYPE_* environment variables:
VOXTYPE_MODEL = large-v3-turbo voxtype
VOXTYPE_AUTO_SUBMIT = true voxtype
VOXTYPE_WHISPER_API_KEY = sk-... voxtype
CLI Flag Overrides
CLI flags override both config file and environment variables:
voxtype --model large-v3-turbo --auto-submit --clipboard
Configuration Priority
Settings are applied in layers (highest priority wins):
CLI flags (highest)
Environment variables (VOXTYPE_*)
Config file (~/.config/voxtype/config.toml)
Built-in defaults (lowest)
Common Configuration Examples
Fast English Transcription
[ whisper ]
model = "base.en"
language = "en"
context_window_optimization = false
[ output ]
mode = "type"
auto_submit = true
Multilingual with GPU
[ whisper ]
model = "large-v3-turbo"
language = [ "en" , "fr" , "de" ]
threads = 8
gpu_isolation = false
[ output ]
mode = "type"
driver_order = [ "wtype" , "dotool" , "clipboard" ]
Non-US Keyboard Layout
[ output ]
mode = "type"
driver_order = [ "dotool" , "wtype" , "clipboard" ]
dotool_xkb_layout = "de"
dotool_xkb_variant = "nodeadkeys"
Compositor Keybindings with LLM Cleanup
[ hotkey ]
enabled = false
[ whisper ]
model = "base.en"
[ output ]
mode = "type"
pre_output_command = "hyprctl dispatch submap voxtype_suppress"
post_output_command = "hyprctl dispatch submap reset"
[ output . post_process ]
command = "ollama run llama3.2:1b 'Clean up this dictation:'"
timeout_ms = 30000
[ text ]
spoken_punctuation = true
replacements = { " vox type " = "voxtype" }
Next Steps
Output Modes Deep dive into type, clipboard, paste, and file output
Text Processing Post-process with LLMs and word replacements
Compositor Integration Set up Hyprland, Sway, or River keybindings
Basic Usage Return to basic usage guide