Basic Usage

Voxtype is a push-to-talk voice-to-text tool for Linux. This guide covers the essential workflows for daily use.

The Push-to-Talk Workflow

Start the daemon: Run voxtype in a terminal or enable the systemd service
Hold your hotkey: Default is ScrollLock
Speak clearly: Talk at a normal pace
Release the hotkey: Your speech is transcribed
Text appears: Either typed at cursor or copied to clipboard

For best results, speak naturally and avoid long pauses. Whisper works best with continuous speech rather than isolated words.

Running the Daemon

Foreground Mode

Run the daemon directly in a terminal:

voxtype                     # Run with defaults
voxtype -v                  # Verbose output
voxtype -vv                 # Debug output
voxtype --clipboard         # Force clipboard mode
voxtype --model small.en    # Use a different model
voxtype --hotkey PAUSE      # Use different hotkey

Systemd Service

For automatic startup on login:

# Install the systemd service
voxtype setup systemd

# Start the service
systemctl --user start voxtype

# Enable on login
systemctl --user enable voxtype

# Check status
voxtype setup systemd --status

The service runs in the background and logs to the systemd journal:

journalctl --user -u voxtype -f  # Follow logs

Push-to-Talk vs Toggle Mode

Voxtype supports two activation modes:

Push-to-Talk (Default)

Hold the hotkey to record, release to transcribe:

[hotkey]
key = "SCROLLLOCK"
mode = "push_to_talk"  # Default

This mode provides precise control over recording duration. It’s ideal when you know exactly what you want to say.

Toggle Mode

Press once to start recording, press again to stop:

[hotkey]
key = "SCROLLLOCK"
mode = "toggle"

Or use the CLI flag:

voxtype --toggle

Toggle mode is useful for longer dictation where holding a key would be uncomfortable.

Using the Built-in Hotkey

The built-in hotkey uses evdev (Linux input subsystem) to detect key presses. This works on both Wayland and X11.

Prerequisites

You must be in the input group:

# Add yourself to the input group
sudo usermod -aG input $USER

# Log out and back in for changes to take effect

# Verify membership
groups | grep input

Default Hotkey

The default hotkey is ScrollLock. This is chosen because:

It’s rarely used for other purposes
It doesn’t interfere with normal typing
It’s available on most keyboards

Changing the Hotkey

Edit ~/.config/voxtype/config.toml:

[hotkey]
key = "PAUSE"  # Or F13-F24, RIGHTALT, etc.

See the Configuration guide for all available keys.

Using Compositor Keybindings

Instead of the built-in evdev hotkey, you can use your compositor’s native keybindings. This has several advantages:

No input group membership required
Use any key combination (e.g., Super+V)
Native feel with familiar keybinding configuration

Setup

Disable the built-in hotkey in ~/.config/voxtype/config.toml:

[hotkey]
enabled = false

Configure your compositor. See the Compositor Integration guide for detailed setup instructions.

Canceling Recording or Transcription

You can cancel an active recording or transcription in progress without outputting any text.

With Built-in Hotkey

Configure a cancel key in your config:

[hotkey]
key = "SCROLLLOCK"
cancel_key = "ESC"  # Press Escape to cancel

Common cancel keys:

ESC - Escape key
BACKSPACE - Backspace key
F12 - Function key

With Compositor Keybindings

Bind a key to the cancel command:

voxtype record cancel

Example (Hyprland):

bind = , ESCAPE, exec, voxtype record cancel

Example (Sway):

bindsym Escape exec voxtype record cancel

If voxtype crashes while typing, the cancel key (or command) will not work. Use the compositor’s escape mechanism if available (see the Compositor Integration guide).

Model Selection and Switching

Voxtype supports multiple Whisper models with different trade-offs between speed and accuracy.

Default Model

The default model is base.en, which provides a good balance:

[whisper]
model = "base.en"

Available Models

Model	Size	Speed	Accuracy	Languages
`tiny.en`	39 MB	Fastest	Good	English only
`base.en`	142 MB	Fast	Better	English only
`small.en`	466 MB	Medium	Great	English only
`medium.en`	1.5 GB	Slow	Excellent	English only
`large-v3`	3.1 GB	Slowest	Best	99 languages
`large-v3-turbo`	1.6 GB	Fast	Excellent	99 languages (GPU recommended)

Switching Models

Change the model in your config:

[whisper]
model = "large-v3-turbo"

Or override at runtime:

voxtype --model large-v3-turbo

Interactive Model Selection

Use the setup command for guided selection:

voxtype setup model

This will:

Show all available models
Download your selection if needed
Update your config file
Optionally restart the daemon

Multi-Model Support

You can configure multiple models and switch between them:

[whisper]
model = "base.en"  # Primary model
secondary_model = "large-v3-turbo"  # For difficult audio
available_models = ["medium.en"]  # Additional models

See the Configuration guide for details on multi-model configuration.

Audio Feedback

Enable audio cues when recording starts and stops:

[audio.feedback]
enabled = true
theme = "default"  # or "subtle", "mechanical"
volume = 0.7       # 0.0 to 1.0

Built-in Themes

default - Clear, pleasant two-tone beeps
subtle - Quiet, unobtrusive clicks
mechanical - Typewriter/keyboard-like sounds

Custom Themes

Point to a directory containing start.wav, stop.wav, and error.wav:

[audio.feedback]
enabled = true
theme = "/home/user/.config/voxtype/sounds"
volume = 0.8

Example Session

$ voxtype
[INFO] Voxtype v0.7.0 starting...
[INFO] Using model: base.en
[INFO] Hotkey: SCROLLLOCK
[INFO] Output mode: type (fallback: clipboard)
[INFO] Ready! Hold SCROLLLOCK to record.

# User holds ScrollLock and says "Hello world"
[INFO] Recording started...
[INFO] Recording stopped (1.2s)
[INFO] Transcribing...
[INFO] Transcribed: "Hello world"
[INFO] Typed 11 characters

Common CLI Options

Option	Description
`-v, -vv`	Increase verbosity (debug, trace)
`-q, --quiet`	Quiet mode (errors only)
`--clipboard`	Force clipboard mode
`--paste`	Force paste mode (clipboard + Ctrl+V)
`--model <MODEL>`	Override transcription model
`--engine <ENGINE>`	Override transcription engine (whisper, parakeet, moonshine, etc.)
`--hotkey <KEY>`	Override hotkey
`--toggle`	Use toggle mode
`--no-hotkey`	Disable built-in hotkey (use compositor keybindings)
`-c, --config <FILE>`	Use custom config file

Next Steps

Compositor Integration

Set up push-to-talk with Hyprland, Sway, or River

Configuration

Customize hotkeys, models, and output settings

Output Modes

Choose between typing, clipboard, paste, and file output

Text Processing

Post-process transcriptions with LLMs and word replacements

Get Started

Guides

Features

Basic Usage

Basic Usage

The Push-to-Talk Workflow

Running the Daemon

Foreground Mode

Systemd Service

Push-to-Talk vs Toggle Mode

Push-to-Talk (Default)

Toggle Mode

Using the Built-in Hotkey

Prerequisites

Default Hotkey

Changing the Hotkey

Using Compositor Keybindings

Setup

Canceling Recording or Transcription

With Built-in Hotkey

With Compositor Keybindings

Model Selection and Switching

Default Model

Available Models

Switching Models

Interactive Model Selection

Multi-Model Support

Audio Feedback

Built-in Themes

Custom Themes

Example Session

Common CLI Options

Next Steps

Compositor Integration

Configuration

Output Modes

Text Processing

Build docs developers (and LLMs) love

Get Started

Guides

Features

​Basic Usage

​The Push-to-Talk Workflow

​Running the Daemon

​Foreground Mode

​Systemd Service

​Push-to-Talk vs Toggle Mode

​Push-to-Talk (Default)

​Toggle Mode

​Using the Built-in Hotkey

​Prerequisites

​Default Hotkey

​Changing the Hotkey

​Using Compositor Keybindings

​Setup

​Canceling Recording or Transcription

​With Built-in Hotkey

​With Compositor Keybindings

​Model Selection and Switching

​Default Model

​Available Models

​Switching Models

​Interactive Model Selection

​Multi-Model Support

​Audio Feedback

​Built-in Themes

​Custom Themes

​Example Session

​Common CLI Options

​Next Steps

Compositor Integration

Configuration

Output Modes

Text Processing

Build docs developers (and LLMs) love

Basic Usage

The Push-to-Talk Workflow

Running the Daemon

Foreground Mode

Systemd Service

Push-to-Talk vs Toggle Mode

Push-to-Talk (Default)

Toggle Mode

Using the Built-in Hotkey

Prerequisites

Default Hotkey

Changing the Hotkey

Using Compositor Keybindings

Setup

Canceling Recording or Transcription

With Built-in Hotkey

With Compositor Keybindings

Model Selection and Switching

Default Model

Available Models

Switching Models

Interactive Model Selection

Multi-Model Support

Audio Feedback

Built-in Themes

Custom Themes

Example Session

Common CLI Options

Next Steps