voxtype daemon

Overview

The daemon command starts Voxtype in its primary mode, running as a foreground process that listens for hotkey events and performs voice-to-text transcription. When no subcommand is specified, voxtype defaults to voxtype daemon.

voxtype daemon
# or simply:
voxtype

Hotkey detection

The daemon supports two hotkey detection modes:

Built-in evdev listener (default) - Kernel-level hotkey detection via evdev. Requires user to be in the input group.
Compositor keybindings - Use your window manager’s native keybinding system (recommended for Wayland). Disable built-in detection with --no-hotkey and configure compositor to call voxtype record start/stop/toggle.

Activation modes

Push-to-talk
Toggle

Hold the hotkey to record, release to transcribe. Default behavior.

voxtype daemon

Press once to start recording, press again to stop and transcribe.

voxtype daemon --toggle

Or in config.toml:

[hotkey]
mode = "toggle"

Configuration

All daemon settings can be configured via:

CLI flags (highest priority) - Override any setting for the current session
Environment variables - Use VOXTYPE_* prefix (e.g., VOXTYPE_MODEL=large-v3-turbo)
Config file - ~/.config/voxtype/config.toml
Defaults - Built-in sensible defaults

Configuration file

config

string

Path to configuration file. Defaults to ~/.config/voxtype/config.toml.

voxtype --config ~/custom-config.toml

Verbosity and logging

verbose

flag

Increase logging verbosity. Use -v for debug output, -vv for trace output.

voxtype -v    # Debug level
voxtype -vv   # Trace level

quiet

flag

Suppress all output except errors.

voxtype --quiet

Hotkey configuration

hotkey

string

Override the hotkey for recording. Examples: SCROLLLOCK, PAUSE, F13, MEDIA, WEV_234, EVTEST_226.

voxtype --hotkey PAUSE

Use wev (Wayland) or evtest (X11/Wayland) to discover key codes:

wev | grep -A4 "key event"
evtest /dev/input/eventX

toggle

flag

Use toggle mode instead of push-to-talk. Press hotkey once to start recording, press again to stop.

voxtype --toggle

no-hotkey

flag

Disable built-in hotkey detection. Use this when relying on compositor keybindings.

voxtype --no-hotkey

Then configure compositor keybindings:

# Hyprland example:
bind = SUPER, R, exec, voxtype record toggle

cancel-key

string

Key to abort recording or transcription without outputting. Examples: ESC, BACKSPACE, F12.

voxtype --cancel-key ESC

model-modifier

string

Modifier key for selecting secondary model during recording. Hold this key while activating the hotkey to use secondary_model.

voxtype --model-modifier LEFTSHIFT --secondary-model large-v3-turbo

Transcription engine

engine

string

Override transcription engine. Options: whisper, parakeet, moonshine, sensevoice, paraformer, dolphin, omnilingual.

voxtype --engine parakeet

model

string

Override model for transcription.Whisper models:

tiny, tiny.en
base, base.en
small, small.en
medium, medium.en
large-v3, large-v3-turbo

Parakeet models:

parakeet-tdt-0.6b-v3
parakeet-tdt-0.6b-v3-int8

voxtype --model large-v3-turbo

Whisper options

initial-prompt

string

Provide context to guide transcription style, terminology, or formatting. Hints at proper nouns and conventions.

voxtype --initial-prompt "Technical documentation with code terms like React, TypeScript, API."

language

string

Language for transcription. Use auto for detection, or specify code(s): en, fr, es, etc. Supports comma-separated list for multilingual: en,fr,de.

voxtype --language en
voxtype --language "en,fr,es"  # Multilingual

translate

flag

Translate non-English speech to English during transcription.

voxtype --translate

threads

number

Number of CPU threads for inference. Default is automatic based on CPU cores.

voxtype --threads 4

gpu-isolation

flag

Run transcription in a subprocess that exits after completion, releasing GPU memory. Useful for preventing VRAM accumulation over multiple recordings.

voxtype --gpu-isolation

Adds slight overhead (~100-300ms) per transcription. Only use if experiencing GPU memory issues.

on-demand-loading

flag

Load model when recording starts instead of keeping it loaded in memory. Reduces idle memory usage at the cost of slower first transcription.

voxtype --on-demand-loading

no-whisper-context-optimization

flag

Disable automatic context window optimization for short recordings. By default, Voxtype uses smaller context windows for recordings under 10 seconds to improve speed.

voxtype --no-whisper-context-optimization

whisper-mode

string

Whisper execution mode: local (in-process), remote (API), or cli (external binary).

voxtype --whisper-mode remote

secondary-model

string

Model to use when holding the model_modifier key. Useful for switching to a larger/more accurate model for difficult audio.

voxtype --secondary-model large-v3-turbo --model-modifier LEFTSHIFT

eager-processing

flag

Start transcribing audio chunks while recording continues. Experimental feature for faster perceived response.

voxtype --eager-processing

Remote Whisper options

remote-endpoint

string

API endpoint URL for remote Whisper mode. Supports OpenAI-compatible APIs.

voxtype --whisper-mode remote --remote-endpoint https://api.openai.com/v1/audio/transcriptions

remote-model

string

Model name to send to remote API.

voxtype --remote-model whisper-1

remote-api-key

string

API key for remote server. Can also use VOXTYPE_WHISPER_API_KEY environment variable.

voxtype --remote-api-key sk-...
# Or:
export VOXTYPE_WHISPER_API_KEY=sk-...
voxtype --whisper-mode remote

Audio configuration

audio-device

string

Audio input device name. Use default for system default, or specify device name from pactl list sources (PulseAudio/PipeWire) or arecord -L (ALSA).

voxtype --audio-device "alsa_input.usb-Blue_Microphones_Yeti_Stereo_Microphone"

max-duration

number

Maximum recording duration in seconds (safety limit). Default is 300 seconds (5 minutes).

voxtype --max-duration 600  # 10 minutes

audio-feedback

flag

Enable audio feedback sounds (beeps when recording starts/stops).

voxtype --audio-feedback

no-audio-feedback

flag

Disable audio feedback sounds.

voxtype --no-audio-feedback

Output configuration

clipboard

flag

Force clipboard-only output mode. Transcribed text is copied to clipboard without typing.

voxtype --clipboard

paste

flag

Force paste mode: copy to clipboard and simulate Ctrl+V keystroke.

voxtype --paste

restore-clipboard

flag

Save clipboard content before paste mode and restore it after paste completes. Preserves your previous clipboard state.

voxtype --paste --restore-clipboard

restore-clipboard-delay-ms

number

Delay in milliseconds after paste before restoring clipboard. Default is 200ms. Increase if paste target hasn’t processed the content yet.

voxtype --restore-clipboard-delay-ms 500

pre-type-delay

number

Delay in milliseconds before typing starts. Helps prevent first character drop in some applications. Default is 0.

voxtype --pre-type-delay 100

type-delay

number

Delay between typed characters in milliseconds. Use for applications that can’t handle fast typing. Default is 0 (fastest).

voxtype --type-delay 10

append-text

string

Text to append after each transcription. Applied before auto_submit. Useful for adding trailing spaces or punctuation.

voxtype --append-text " "

driver

string

Output driver order for type mode (comma-separated). Available: wtype, dotool, ydotool, clipboard.

voxtype --driver ydotool,wtype,clipboard

Default order: wtype → dotool → ydotool → clipboard

auto-submit

flag

Automatically press Enter after outputting transcribed text.

voxtype --auto-submit

no-auto-submit

flag

Disable auto-submit (overrides config setting).

voxtype --no-auto-submit

shift-enter-newlines

flag

Convert newlines in transcription to Shift+Enter instead of regular Enter. Useful for chat applications that send on Enter.

voxtype --shift-enter-newlines

no-shift-enter-newlines

flag

Disable Shift+Enter newlines (overrides config).

voxtype --no-shift-enter-newlines

fallback-to-clipboard

flag

Fall back to clipboard if typing fails.

voxtype --fallback-to-clipboard

no-fallback-to-clipboard

flag

Disable clipboard fallback.

voxtype --no-fallback-to-clipboard

spoken-punctuation

flag

Enable spoken punctuation conversion. Say “period”, “comma”, “question mark” to insert punctuation.

voxtype --spoken-punctuation

paste-keys

string

Keystroke combination for paste mode. Examples: ctrl+v, shift+insert, ctrl+shift+v.

voxtype --paste-keys ctrl+shift+v

dotool-xkb-layout

string

Keyboard layout for dotool output (e.g., de, fr). Used when dotool is active.

voxtype --dotool-xkb-layout de

dotool-xkb-variant

string

Keyboard layout variant for dotool (e.g., nodeadkeys).

voxtype --dotool-xkb-variant nodeadkeys

file-path

string

File path for file output mode. Used with --file in record commands or when output mode is set to file.

voxtype --file-path ~/transcriptions.txt

file-mode

string

File write mode: overwrite or append.

voxtype --file-mode append

pre-output-command

string

Command to run before typing output. Useful for compositor submap switching.

voxtype --pre-output-command "hyprctl dispatch submap reset"

post-output-command

string

Command to run after typing output.

voxtype --post-output-command "notify-send 'Transcription complete'"

pre-recording-command

string

Command to run when recording starts. Useful for visual indicators or compositor state changes.

voxtype --pre-recording-command "hyprctl dispatch submap voxtype"

Voice Activity Detection (VAD)

vad

flag

Enable Voice Activity Detection to filter silence before transcription. Prevents Whisper hallucinations on silence-only recordings.

voxtype --vad

vad-threshold

number

Speech detection threshold (0.0-1.0). Lower values are more sensitive. Default is 0.5.

voxtype --vad --vad-threshold 0.3  # More sensitive

vad-backend

string

VAD backend to use: auto, energy, whisper.

auto - Whisper VAD for Whisper engine, Energy for ONNX engines
energy - Simple RMS-based detection, no model needed
whisper - Silero model via whisper-rs, requires model download

voxtype --vad --vad-backend whisper

vad-min-speech-ms

number

Minimum speech duration in milliseconds for VAD to consider audio as containing speech.

voxtype --vad --vad-min-speech-ms 500

Systemd service mode

Run Voxtype as a systemd user service for automatic startup:

# Install service
voxtype setup systemd

# Enable autostart
systemctl --user enable voxtype.service

# Start service
systemctl --user start voxtype.service

# Check status
voxtype setup systemd --status

The service runs voxtype daemon with settings from your config file.

Example configurations

Default settings with scroll lock hotkey:

voxtype

Use OpenAI API for transcription:

export VOXTYPE_WHISPER_API_KEY=sk-...
voxtype --whisper-mode remote \
  --remote-endpoint https://api.openai.com/v1/audio/transcriptions \
  --remote-model whisper-1

Disable built-in hotkey, use Hyprland bindings:

voxtype --no-hotkey

In ~/.config/hypr/hyprland.conf:

bind = SUPER, R, exec, voxtype record toggle

Copy to clipboard and paste, then restore previous clipboard:

voxtype --paste --restore-clipboard

Transcribe English, French, or Spanish with VAD:

voxtype --language "en,fr,es" --vad

Release GPU memory after each transcription:

voxtype --gpu-isolation --on-demand-loading

Core Commands

Configuration

Meeting Mode

Overview

Hotkey detection

Activation modes

Configuration

Configuration file

Verbosity and logging

Hotkey configuration

Transcription engine

Whisper options

Remote Whisper options

Audio configuration

Output configuration

Voice Activity Detection (VAD)

Systemd service mode

Example configurations

See also

Build docs developers (and LLMs) love

Core Commands

Configuration

Meeting Mode

​Overview

​Hotkey detection

​Activation modes

​Configuration

​Configuration file

​Verbosity and logging

​Hotkey configuration

​Transcription engine

​Whisper options

​Remote Whisper options

​Audio configuration

​Output configuration

​Voice Activity Detection (VAD)

​Systemd service mode

​Example configurations

​See also

Build docs developers (and LLMs) love

Overview

Hotkey detection

Activation modes

Configuration

Configuration file

Verbosity and logging

Hotkey configuration

Transcription engine

Whisper options

Remote Whisper options

Audio configuration

Output configuration

Voice Activity Detection (VAD)

Systemd service mode

Example configurations

See also