Skip to main content

Overview

The daemon command starts Voxtype in its primary mode, running as a foreground process that listens for hotkey events and performs voice-to-text transcription. When no subcommand is specified, voxtype defaults to voxtype daemon.
voxtype daemon
# or simply:
voxtype

Hotkey detection

The daemon supports two hotkey detection modes:
  1. Built-in evdev listener (default) - Kernel-level hotkey detection via evdev. Requires user to be in the input group.
  2. Compositor keybindings - Use your window manager’s native keybinding system (recommended for Wayland). Disable built-in detection with --no-hotkey and configure compositor to call voxtype record start/stop/toggle.

Activation modes

Hold the hotkey to record, release to transcribe. Default behavior.
voxtype daemon

Configuration

All daemon settings can be configured via:
  1. CLI flags (highest priority) - Override any setting for the current session
  2. Environment variables - Use VOXTYPE_* prefix (e.g., VOXTYPE_MODEL=large-v3-turbo)
  3. Config file - ~/.config/voxtype/config.toml
  4. Defaults - Built-in sensible defaults

Configuration file

config
string
Path to configuration file. Defaults to ~/.config/voxtype/config.toml.
voxtype --config ~/custom-config.toml

Verbosity and logging

verbose
flag
Increase logging verbosity. Use -v for debug output, -vv for trace output.
voxtype -v    # Debug level
voxtype -vv   # Trace level
quiet
flag
Suppress all output except errors.
voxtype --quiet

Hotkey configuration

hotkey
string
Override the hotkey for recording. Examples: SCROLLLOCK, PAUSE, F13, MEDIA, WEV_234, EVTEST_226.
voxtype --hotkey PAUSE
Use wev (Wayland) or evtest (X11/Wayland) to discover key codes:
wev | grep -A4 "key event"
evtest /dev/input/eventX
toggle
flag
Use toggle mode instead of push-to-talk. Press hotkey once to start recording, press again to stop.
voxtype --toggle
no-hotkey
flag
Disable built-in hotkey detection. Use this when relying on compositor keybindings.
voxtype --no-hotkey
Then configure compositor keybindings:
# Hyprland example:
bind = SUPER, R, exec, voxtype record toggle
cancel-key
string
Key to abort recording or transcription without outputting. Examples: ESC, BACKSPACE, F12.
voxtype --cancel-key ESC
model-modifier
string
Modifier key for selecting secondary model during recording. Hold this key while activating the hotkey to use secondary_model.
voxtype --model-modifier LEFTSHIFT --secondary-model large-v3-turbo

Transcription engine

engine
string
Override transcription engine. Options: whisper, parakeet, moonshine, sensevoice, paraformer, dolphin, omnilingual.
voxtype --engine parakeet
model
string
Override model for transcription.Whisper models:
  • tiny, tiny.en
  • base, base.en
  • small, small.en
  • medium, medium.en
  • large-v3, large-v3-turbo
Parakeet models:
  • parakeet-tdt-0.6b-v3
  • parakeet-tdt-0.6b-v3-int8
voxtype --model large-v3-turbo

Whisper options

initial-prompt
string
Provide context to guide transcription style, terminology, or formatting. Hints at proper nouns and conventions.
voxtype --initial-prompt "Technical documentation with code terms like React, TypeScript, API."
language
string
Language for transcription. Use auto for detection, or specify code(s): en, fr, es, etc. Supports comma-separated list for multilingual: en,fr,de.
voxtype --language en
voxtype --language "en,fr,es"  # Multilingual
translate
flag
Translate non-English speech to English during transcription.
voxtype --translate
threads
number
Number of CPU threads for inference. Default is automatic based on CPU cores.
voxtype --threads 4
gpu-isolation
flag
Run transcription in a subprocess that exits after completion, releasing GPU memory. Useful for preventing VRAM accumulation over multiple recordings.
voxtype --gpu-isolation
Adds slight overhead (~100-300ms) per transcription. Only use if experiencing GPU memory issues.
on-demand-loading
flag
Load model when recording starts instead of keeping it loaded in memory. Reduces idle memory usage at the cost of slower first transcription.
voxtype --on-demand-loading
no-whisper-context-optimization
flag
Disable automatic context window optimization for short recordings. By default, Voxtype uses smaller context windows for recordings under 10 seconds to improve speed.
voxtype --no-whisper-context-optimization
whisper-mode
string
Whisper execution mode: local (in-process), remote (API), or cli (external binary).
voxtype --whisper-mode remote
secondary-model
string
Model to use when holding the model_modifier key. Useful for switching to a larger/more accurate model for difficult audio.
voxtype --secondary-model large-v3-turbo --model-modifier LEFTSHIFT
eager-processing
flag
Start transcribing audio chunks while recording continues. Experimental feature for faster perceived response.
voxtype --eager-processing

Remote Whisper options

remote-endpoint
string
API endpoint URL for remote Whisper mode. Supports OpenAI-compatible APIs.
voxtype --whisper-mode remote --remote-endpoint https://api.openai.com/v1/audio/transcriptions
remote-model
string
Model name to send to remote API.
voxtype --remote-model whisper-1
remote-api-key
string
API key for remote server. Can also use VOXTYPE_WHISPER_API_KEY environment variable.
voxtype --remote-api-key sk-...
# Or:
export VOXTYPE_WHISPER_API_KEY=sk-...
voxtype --whisper-mode remote

Audio configuration

audio-device
string
Audio input device name. Use default for system default, or specify device name from pactl list sources (PulseAudio/PipeWire) or arecord -L (ALSA).
voxtype --audio-device "alsa_input.usb-Blue_Microphones_Yeti_Stereo_Microphone"
max-duration
number
Maximum recording duration in seconds (safety limit). Default is 300 seconds (5 minutes).
voxtype --max-duration 600  # 10 minutes
audio-feedback
flag
Enable audio feedback sounds (beeps when recording starts/stops).
voxtype --audio-feedback
no-audio-feedback
flag
Disable audio feedback sounds.
voxtype --no-audio-feedback

Output configuration

clipboard
flag
Force clipboard-only output mode. Transcribed text is copied to clipboard without typing.
voxtype --clipboard
paste
flag
Force paste mode: copy to clipboard and simulate Ctrl+V keystroke.
voxtype --paste
restore-clipboard
flag
Save clipboard content before paste mode and restore it after paste completes. Preserves your previous clipboard state.
voxtype --paste --restore-clipboard
restore-clipboard-delay-ms
number
Delay in milliseconds after paste before restoring clipboard. Default is 200ms. Increase if paste target hasn’t processed the content yet.
voxtype --restore-clipboard-delay-ms 500
pre-type-delay
number
Delay in milliseconds before typing starts. Helps prevent first character drop in some applications. Default is 0.
voxtype --pre-type-delay 100
type-delay
number
Delay between typed characters in milliseconds. Use for applications that can’t handle fast typing. Default is 0 (fastest).
voxtype --type-delay 10
append-text
string
Text to append after each transcription. Applied before auto_submit. Useful for adding trailing spaces or punctuation.
voxtype --append-text " "
driver
string
Output driver order for type mode (comma-separated). Available: wtype, dotool, ydotool, clipboard.
voxtype --driver ydotool,wtype,clipboard
Default order: wtype → dotool → ydotool → clipboard
auto-submit
flag
Automatically press Enter after outputting transcribed text.
voxtype --auto-submit
no-auto-submit
flag
Disable auto-submit (overrides config setting).
voxtype --no-auto-submit
shift-enter-newlines
flag
Convert newlines in transcription to Shift+Enter instead of regular Enter. Useful for chat applications that send on Enter.
voxtype --shift-enter-newlines
no-shift-enter-newlines
flag
Disable Shift+Enter newlines (overrides config).
voxtype --no-shift-enter-newlines
fallback-to-clipboard
flag
Fall back to clipboard if typing fails.
voxtype --fallback-to-clipboard
no-fallback-to-clipboard
flag
Disable clipboard fallback.
voxtype --no-fallback-to-clipboard
spoken-punctuation
flag
Enable spoken punctuation conversion. Say “period”, “comma”, “question mark” to insert punctuation.
voxtype --spoken-punctuation
paste-keys
string
Keystroke combination for paste mode. Examples: ctrl+v, shift+insert, ctrl+shift+v.
voxtype --paste-keys ctrl+shift+v
dotool-xkb-layout
string
Keyboard layout for dotool output (e.g., de, fr). Used when dotool is active.
voxtype --dotool-xkb-layout de
dotool-xkb-variant
string
Keyboard layout variant for dotool (e.g., nodeadkeys).
voxtype --dotool-xkb-variant nodeadkeys
file-path
string
File path for file output mode. Used with --file in record commands or when output mode is set to file.
voxtype --file-path ~/transcriptions.txt
file-mode
string
File write mode: overwrite or append.
voxtype --file-mode append
pre-output-command
string
Command to run before typing output. Useful for compositor submap switching.
voxtype --pre-output-command "hyprctl dispatch submap reset"
post-output-command
string
Command to run after typing output.
voxtype --post-output-command "notify-send 'Transcription complete'"
pre-recording-command
string
Command to run when recording starts. Useful for visual indicators or compositor state changes.
voxtype --pre-recording-command "hyprctl dispatch submap voxtype"

Voice Activity Detection (VAD)

vad
flag
Enable Voice Activity Detection to filter silence before transcription. Prevents Whisper hallucinations on silence-only recordings.
voxtype --vad
vad-threshold
number
Speech detection threshold (0.0-1.0). Lower values are more sensitive. Default is 0.5.
voxtype --vad --vad-threshold 0.3  # More sensitive
vad-backend
string
VAD backend to use: auto, energy, whisper.
  • auto - Whisper VAD for Whisper engine, Energy for ONNX engines
  • energy - Simple RMS-based detection, no model needed
  • whisper - Silero model via whisper-rs, requires model download
voxtype --vad --vad-backend whisper
vad-min-speech-ms
number
Minimum speech duration in milliseconds for VAD to consider audio as containing speech.
voxtype --vad --vad-min-speech-ms 500

Systemd service mode

Run Voxtype as a systemd user service for automatic startup:
# Install service
voxtype setup systemd

# Enable autostart
systemctl --user enable voxtype.service

# Start service
systemctl --user start voxtype.service

# Check status
voxtype setup systemd --status
The service runs voxtype daemon with settings from your config file.

Example configurations

Default settings with scroll lock hotkey:
voxtype

See also

Build docs developers (and LLMs) love