Configuration

Voxtype uses a TOML configuration file to customize behavior. This guide covers all major configuration options.

Config File Location

Voxtype looks for configuration in this order:

Path specified via -c / --config flag
~/.config/voxtype/config.toml (default)
/etc/voxtype/config.toml (system-wide)
Built-in defaults

Creating a Config File

# Create config directory
mkdir -p ~/.config/voxtype

# View current config with defaults
voxtype config > ~/.config/voxtype/config.toml

Or copy the example:

cp config/default.toml ~/.config/voxtype/config.toml

Configuration Structure

The config file is organized into sections:

[hotkey] - Key bindings and activation mode
[audio] - Microphone and recording settings
[whisper] - Transcription model and language
[output] - Text output mode and drivers
[text] - Word replacements and spoken punctuation
[meeting] - Meeting transcription settings
[vad] - Voice Activity Detection
[status] - Status bar integration icons
[profiles] - Named configurations for different contexts

State File

Controls where the daemon writes its current state for external integrations (Waybar, scripts).

# Default: auto-detect location ($XDG_RUNTIME_DIR/voxtype/state)
state_file = "auto"

# Or specify a path
state_file = "/tmp/voxtype-state"

# Or disable
state_file = "disabled"

The state file contains: "idle", "recording", "transcribing", or "outputting". Required for voxtype record toggle and voxtype status commands.

[hotkey] - Key Bindings

Controls the built-in evdev hotkey detection.

Basic Configuration

[hotkey]
# Enable built-in hotkey (false if using compositor keybindings)
enabled = true

# Main hotkey (evdev key name)
key = "SCROLLLOCK"

# Optional modifier keys
modifiers = []  # e.g., ["LEFTCTRL", "LEFTALT"]

# Activation mode: "push_to_talk" or "toggle"
mode = "push_to_talk"

Available Keys

Common hotkeys:

SCROLLLOCK - Scroll Lock key (default)
PAUSE - Pause/Break key
RIGHTALT - Right Alt key
F13 through F24 - Extended function keys
MEDIA - Media key
RECORD - Record key
INSERT, HOME, END, PAGEUP, PAGEDOWN, DELETE

Use evtest to find key names:

sudo evtest
# Select keyboard, press key, note KEY_XXXXX name (without KEY_ prefix)

Numeric Keycodes

If your key isn’t in the built-in list, specify by keycode:

[hotkey]
key = "WEV_234"      # XKB keycode from wev/xev
key = "EVTEST_226"   # Kernel keycode from evtest
key = "WEV_0xEA"     # Hex also works

Prefixes:

WEV_, X11_, XEV_ - XKB keycode (offset by 8 from kernel)
EVTEST_ - Kernel keycode

Cancel Key

Abort recording or transcription without outputting text:

[hotkey]
cancel_key = "ESC"  # Press Escape to cancel

Modifier Key for Secondary Model

Use a different model when holding a modifier:

[hotkey]
model_modifier = "LEFTSHIFT"  # Shift + hotkey uses secondary model

[whisper]
model = "base.en"
secondary_model = "large-v3-turbo"

[audio] - Audio Settings

[audio]
# Audio input device ("default" or device name from pactl)
device = "default"

# Sample rate in Hz (whisper expects 16000)
sample_rate = 16000

# Maximum recording duration in seconds (safety limit)
max_duration_secs = 60

Finding Device Names

pactl list sources short

Example:

[audio]
device = "alsa_input.usb-Blue_Microphones_Yeti-00.analog-stereo"

Audio Feedback

Sound cues when recording starts/stops:

[audio.feedback]
enabled = true
theme = "default"  # "default", "subtle", "mechanical", or path
volume = 0.7       # 0.0 to 1.0

Built-in themes:

default - Clear two-tone beeps
subtle - Quiet clicks
mechanical - Typewriter sounds

Custom theme:

[audio.feedback]
enabled = true
theme = "/home/user/.config/voxtype/sounds"
volume = 0.8

Custom theme directory must contain: start.wav, stop.wav, error.wav

[whisper] - Transcription

Basic Configuration

[whisper]
# Execution mode: "local", "remote", or "cli"
mode = "local"

# Model to use
model = "base.en"

# Language: "en", "auto", or ["en", "fr"] for constrained detection
language = "en"

# Translate non-English speech to English
translate = false

# Number of CPU threads (omit for auto-detect)
# threads = 4

Available Models

Model	Size	Speed	Accuracy	Languages
`tiny.en`	39 MB	Fastest	Good	English only
`base.en`	142 MB	Fast	Better	English only
`small.en`	466 MB	Medium	Great	English only
`medium.en`	1.5 GB	Slow	Excellent	English only
`large-v3`	3.1 GB	Slowest	Best	99 languages
`large-v3-turbo`	1.6 GB	Fast	Excellent	99 languages

.en models are English-only but faster and more accurate for English. large-v3-turbo is recommended for GPU users.

Language Configuration

Three modes: 1. Single language (fastest, most accurate):

[whisper]
language = "en"  # or "fr", "es", "de", "ja", "zh", etc.

2. Auto-detect from all languages:

[whisper]
language = "auto"
model = "large-v3"  # Multilingual model required

3. Constrained auto-detect (recommended for multilingual users):

[whisper]
language = ["en", "fr"]  # Auto-detect between English and French
model = "large-v3"

Constrained detection is more accurate for short sentences where Whisper might misdetect the language.

Multi-Model Configuration

Configure multiple models:

[whisper]
model = "base.en"  # Primary model

# Secondary model (used with hotkey.model_modifier or CLI --model)
secondary_model = "large-v3-turbo"

# Additional available models
available_models = ["medium.en"]

# Model caching (only applies when gpu_isolation = false)
max_loaded_models = 2  # LRU eviction
cold_model_timeout_secs = 300  # 5 minutes before unloading idle models

Use via CLI:

voxtype record start --model large-v3-turbo

On-Demand Loading

Load model only when recording starts:

[whisper]
on_demand_loading = true

Trade-off:

Pros: Saves memory/VRAM when idle
Cons: Slight delay at start of first recording

GPU Memory Isolation

Run transcription in a subprocess that exits after completion:

[whisper]
gpu_isolation = true

When to use:

Laptops with hybrid graphics (allows dGPU to sleep)
Limited VRAM (releases GPU memory between recordings)

Trade-off: Slightly slower per recording (model load/unload overhead)

Context Window Optimization

Use smaller context window for short recordings:

[whisper]
context_window_optimization = false  # Disabled by default

Some models (especially large-v3 and large-v3-turbo) may experience repetition loops with this enabled. Only enable if you experience faster transcription without issues.

Initial Prompt

Provide context to improve accuracy:

[whisper]
initial_prompt = "Technical discussion about Rust, TypeScript, and Kubernetes."

Use for:

Domain-specific terminology
Proper nouns (names, products)
Formatting conventions

Remote Whisper Server

Send audio to a remote server:

[whisper]
mode = "remote"
remote_endpoint = "http://192.168.1.100:8080"  # whisper.cpp server
remote_model = "whisper-1"  # Model name to request
remote_timeout_secs = 30

# Optional: API key (or use VOXTYPE_WHISPER_API_KEY env var)
# remote_api_key = "sk-..."

OpenAI API:

[whisper]
mode = "remote"
remote_endpoint = "https://api.openai.com"
remote_model = "whisper-1"
remote_api_key = "sk-proj-..."

CLI Backend (whisper-cli)

Use whisper-cli subprocess instead of FFI:

[whisper]
mode = "cli"
whisper_cli_path = "/usr/local/bin/whisper-cli"  # Optional

Fallback for systems where whisper-rs crashes (e.g., glibc 2.42+).

[output] - Text Output

Output Mode

[output]
# Primary output mode: "type", "clipboard", "paste", or "file"
mode = "type"

# Fall back to clipboard if typing fails
fallback_to_clipboard = true

Modes:

type - Simulates keyboard input at cursor (wtype → dotool → ydotool → clipboard)
clipboard - Copies text to clipboard (wl-clipboard)
paste - Copies to clipboard then simulates Ctrl+V
file - Writes to a file

See the Output Modes guide for details.

Driver Order

Customize the fallback chain for type mode:

[output]
driver_order = ["wtype", "dotool", "ydotool", "clipboard"]

Available drivers:

wtype - Wayland virtual keyboard (best Unicode/CJK support, no daemon)
dotool - uinput with keyboard layout support (no daemon)
ydotool - uinput fallback (requires ydotoold daemon)
clipboard - wl-clipboard (universal fallback)

Examples:

# Prefer ydotool over dotool
driver_order = ["wtype", "ydotool", "dotool", "clipboard"]

# Use only ydotool (no fallback)
driver_order = ["ydotool"]

Typing Delays

[output]
# Delay before typing starts (ms)
pre_type_delay_ms = 0

# Delay between each character (ms)
type_delay_ms = 0

Increase if characters are dropped or typed out of order.

Auto-Submit

Automatically press Enter after output:

[output]
auto_submit = true

Useful for chat apps, terminals, or forms.

Shift+Enter Newlines

Convert newlines to Shift+Enter instead of Enter:

[output]
shift_enter_newlines = true

Useful for apps where Enter submits (Slack, Discord, Cursor IDE).

Paste Mode Settings

[output]
mode = "paste"

# Keystroke for paste (default: "ctrl+v")
paste_keys = "ctrl+v"  # or "shift+insert", "ctrl+shift+v"

# Restore clipboard after paste
restore_clipboard = false
restore_clipboard_delay_ms = 200

File Output

[output]
mode = "file"
file_path = "/tmp/voxtype-output.txt"
file_mode = "overwrite"  # or "append"

Pre/Post Output Hooks

Run commands before/after typing:

[output]
# Run before typing starts
pre_output_command = "hyprctl dispatch submap voxtype_suppress"

# Run after typing completes
post_output_command = "hyprctl dispatch submap reset"

Used for compositor integration to block modifier keys during typing.

dotool Keyboard Layout

For non-US keyboard layouts:

[output]
dotool_xkb_layout = "de"  # German
dotool_xkb_variant = "nodeadkeys"  # Optional

See the Output Modes guide for more details.

Notifications

[output.notification]
on_recording_start = false  # Notify when recording starts
on_recording_stop = false   # Notify when transcription begins
on_transcription = true     # Show transcribed text

Post-Processing Command

Pipe transcriptions through an external command:

[output.post_process]
command = "ollama run llama3.2:1b 'Clean up this dictation. Fix grammar, remove filler words:'"
timeout_ms = 30000  # 30 seconds

See the Text Processing guide for examples.

[text] - Text Processing

Word Replacements

Fix commonly misheard words:

[text]
replacements = { "vox type" = "voxtype", "oh marky" = "Omarchy" }

Case-insensitive matching.

Spoken Punctuation

Convert spoken words to symbols:

[text]
spoken_punctuation = true

Examples:

“period” → .
“comma” → ,
“open paren” → (
“close paren” → )
“new line” → \n

See the Text Processing guide for the full list.

[meeting] - Meeting Transcription

Continuous transcription for meetings:

[meeting]
enabled = false
chunk_duration_secs = 30
storage_path = "auto"  # ~/.local/share/voxtype/meetings/
retain_audio = false
max_duration_mins = 180  # 3 hours

[meeting.audio]
mic_device = "default"
loopback_device = "auto"  # Capture remote participants
echo_cancel = "auto"  # Remove speaker bleed-through

[meeting.diarization]
enabled = true
backend = "simple"  # "simple", "ml", or "remote"
max_speakers = 10

[meeting.summary]
backend = "disabled"  # "local", "remote", or "disabled"
ollama_url = "http://localhost:11434"
ollama_model = "llama3.2"
timeout_secs = 120

See the Meeting Mode documentation for details.

[vad] - Voice Activity Detection

Filter silence-only recordings:

[vad]
enabled = false  # Enable VAD
backend = "auto"  # "auto", "energy", or "whisper"
threshold = 0.5   # 0.0 (sensitive) to 1.0 (aggressive)
min_speech_duration_ms = 100

Backends:

auto - Whisper VAD for Whisper engine, Energy VAD for ONNX engines
energy - RMS-based detection (no model needed)
whisper - Silero VAD via whisper-rs (requires model download)

[status] - Status Display Icons

Customize icons for Waybar/tray integrations:

[status]
icon_theme = "emoji"  # or "nerd-font", "material", "minimal", "text", etc.

# Per-state overrides (optional)
[status.icons]
idle = "🎙️"
recording = "🎤"
transcribing = "⏳"
stopped = ""

Built-in themes:

Theme	Requires Font	Icons
`emoji`	No	🎙️ 🎤 ⏳
`minimal`	No	○ ● ◐ ×
`dots`	No	◯ ⬤ ◔ ◌
`arrows`	No	▶ ● ↻ ■
`text`	No	[MIC] [REC] […] [OFF]
`nerd-font`	Nerd Font	   
`material`	MDI	c a e d
`phosphor`	Phosphor	   
`codicons`	Codicons	   
`omarchy`	Nerd Font	 c f 

[profiles] - Named Profiles

Context-specific configurations:

[profiles.slack]
post_process_command = "ollama run llama3.2:1b 'Format for Slack...'"

[profiles.code]
post_process_command = "ollama run llama3.2:1b 'Format as code comment...'"
output_mode = "clipboard"

Use with:

voxtype record start --profile slack

Environment Variable Overrides

All config options can be overridden via VOXTYPE_* environment variables:

VOXTYPE_MODEL=large-v3-turbo voxtype
VOXTYPE_AUTO_SUBMIT=true voxtype
VOXTYPE_WHISPER_API_KEY=sk-... voxtype

CLI Flag Overrides

CLI flags override both config file and environment variables:

voxtype --model large-v3-turbo --auto-submit --clipboard

Configuration Priority

Settings are applied in layers (highest priority wins):

CLI flags (highest)
Environment variables (VOXTYPE_*)
Config file (~/.config/voxtype/config.toml)
Built-in defaults (lowest)

Common Configuration Examples

Fast English Transcription

[whisper]
model = "base.en"
language = "en"
context_window_optimization = false

[output]
mode = "type"
auto_submit = true

Multilingual with GPU

[whisper]
model = "large-v3-turbo"
language = ["en", "fr", "de"]
threads = 8
gpu_isolation = false

[output]
mode = "type"
driver_order = ["wtype", "dotool", "clipboard"]

Non-US Keyboard Layout

[output]
mode = "type"
driver_order = ["dotool", "wtype", "clipboard"]
dotool_xkb_layout = "de"
dotool_xkb_variant = "nodeadkeys"

Compositor Keybindings with LLM Cleanup

[hotkey]
enabled = false

[whisper]
model = "base.en"

[output]
mode = "type"
pre_output_command = "hyprctl dispatch submap voxtype_suppress"
post_output_command = "hyprctl dispatch submap reset"

[output.post_process]
command = "ollama run llama3.2:1b 'Clean up this dictation:'"
timeout_ms = 30000

[text]
spoken_punctuation = true
replacements = { "vox type" = "voxtype" }

Next Steps

Output Modes

Deep dive into type, clipboard, paste, and file output

Text Processing

Post-process with LLMs and word replacements

Compositor Integration

Set up Hyprland, Sway, or River keybindings

Basic Usage

Return to basic usage guide

Get Started

Guides

Features

​Configuration

​Config File Location

​Creating a Config File

​Configuration Structure

​State File

​[hotkey] - Key Bindings

​Basic Configuration

​Available Keys

​Numeric Keycodes

​Cancel Key

​Modifier Key for Secondary Model

​[audio] - Audio Settings

​Finding Device Names

​Audio Feedback

​[whisper] - Transcription

​Basic Configuration

​Available Models

​Language Configuration

​Multi-Model Configuration

​On-Demand Loading

​GPU Memory Isolation

​Context Window Optimization

​Initial Prompt

​Remote Whisper Server

​CLI Backend (whisper-cli)

​[output] - Text Output

​Output Mode

​Driver Order

​Typing Delays

​Auto-Submit

​Shift+Enter Newlines

​Paste Mode Settings

​File Output

​Pre/Post Output Hooks

​dotool Keyboard Layout

​Notifications

​Post-Processing Command

​[text] - Text Processing

​Word Replacements

​Spoken Punctuation

​[meeting] - Meeting Transcription

​[vad] - Voice Activity Detection

​[status] - Status Display Icons

​[profiles] - Named Profiles

​Environment Variable Overrides

​CLI Flag Overrides

​Configuration Priority

​Common Configuration Examples

​Fast English Transcription

​Multilingual with GPU

​Non-US Keyboard Layout

​Compositor Keybindings with LLM Cleanup

​Next Steps

Output Modes

Text Processing

Compositor Integration

Basic Usage

Build docs developers (and LLMs) love

Configuration

Config File Location

Creating a Config File

Configuration Structure

State File

[hotkey] - Key Bindings

Basic Configuration

Available Keys

Numeric Keycodes

Cancel Key

Modifier Key for Secondary Model

[audio] - Audio Settings

Finding Device Names

Audio Feedback

[whisper] - Transcription

Basic Configuration

Available Models

Language Configuration

Multi-Model Configuration

On-Demand Loading

GPU Memory Isolation

Context Window Optimization

Initial Prompt

Remote Whisper Server

CLI Backend (whisper-cli)

[output] - Text Output

Output Mode

Driver Order

Typing Delays

Auto-Submit

Shift+Enter Newlines

Paste Mode Settings

File Output

Pre/Post Output Hooks

dotool Keyboard Layout

Notifications

Post-Processing Command

[text] - Text Processing

Word Replacements

Spoken Punctuation

[meeting] - Meeting Transcription

[vad] - Voice Activity Detection

[status] - Status Display Icons

[profiles] - Named Profiles

Environment Variable Overrides

CLI Flag Overrides

Configuration Priority

Common Configuration Examples

Fast English Transcription

Multilingual with GPU

Non-US Keyboard Layout

Compositor Keybindings with LLM Cleanup

Next Steps