Quick start

Get Voxtype installed and transcribing speech in just a few steps.

Installation

Install Voxtype

Choose your distribution’s package manager or install from source.

Arch Linux
Debian/Ubuntu
Fedora
From source

yay -S voxtype
# or
yay -S voxtype-bin

wget https://github.com/peteonrails/voxtype/releases/latest/download/voxtype_*_amd64.deb
sudo dpkg -i voxtype_*_amd64.deb

wget https://github.com/peteonrails/voxtype/releases/latest/download/voxtype-*.x86_64.rpm
sudo dnf install voxtype-*.x86_64.rpm

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone and build
git clone https://github.com/peteonrails/voxtype.git
cd voxtype
cargo build --release

# Binary is at: target/release/voxtype

See the Installation page for all options including GPU acceleration variants.

Install text input driver

Voxtype needs a way to output text. Install wtype for best Unicode/CJK support on Wayland.

sudo dnf install wtype wl-clipboard

wtype works on most Wayland compositors. For KDE/GNOME, install dotool or ydotool instead. See Output Modes for details.

Download transcription model

Voxtype uses Whisper models for offline speech recognition. Download the base.en model for a good balance of speed and accuracy.

voxtype setup --download

This downloads the default model (~142 MB) to ~/.local/share/voxtype/.

Available models

Model	Size	Accuracy	Speed
tiny.en	39 MB	~10% WER	Fastest
base.en	142 MB	~8% WER	Fast
small.en	466 MB	~6% WER	Medium
medium.en	1.5 GB	~5% WER	Slow
large-v3-turbo	1.6 GB	~4% WER	Fast (with GPU)

Use voxtype setup model for interactive selection or voxtype setup --download --model <name> to download a specific model.

Configure compositor keybinding

The best way to use Voxtype is with your compositor’s native keybindings. This provides push-to-talk without special permissions.

Hyprland
Sway
River
X11 / Built-in hotkey

Add to ~/.config/hypr/hyprland.conf:

bind = SUPER, V, exec, voxtype record start
bindr = SUPER, V, exec, voxtype record stop

Reload config: hyprctl reload

Add to ~/.config/sway/config:

bindsym --no-repeat $mod+v exec voxtype record start
bindsym --release $mod+v exec voxtype record stop

Reload config: swaymsg reload

Add to ~/.config/river/init:

riverctl map normal Super V spawn 'voxtype record start'
riverctl map -release normal Super V spawn 'voxtype record stop'

Restart River to apply changes.

If you’re on X11 or prefer the built-in hotkey:

# Add yourself to input group
sudo usermod -aG input $USER
# Log out and back in for changes to take effect

The built-in hotkey is ScrollLock by default.

Then disable the built-in hotkey to avoid conflicts:

mkdir -p ~/.config/voxtype
cat > ~/.config/voxtype/config.toml << 'EOF'
[hotkey]
enabled = false
EOF

If using compositor keybindings, you must set enabled = false to disable the built-in hotkey.

Start the daemon

Run Voxtype to start the transcription daemon.

Foreground
Systemd

voxtype

Keeps the daemon running in your terminal. Press Ctrl+C to stop.

# Install service
voxtype setup systemd

# Enable and start
systemctl --user enable --now voxtype

# Check status
systemctl --user status voxtype

The daemon starts automatically on login.

You should see output like:

[INFO] Loading model: base.en
[INFO] Model loaded successfully
[INFO] Daemon started, waiting for hotkey press...

Test voice input

Now test push-to-talk:

Click in any text field (browser, text editor, terminal)
Hold Super+V (or your configured hotkey)
Speak clearly: “This is a test of voice to text”
Release the key

After a moment, the transcribed text should appear at your cursor.

The first transcription may take a few seconds while the model loads. Subsequent transcriptions are much faster.

What’s next?

Basic usage

Learn push-to-talk, toggle mode, and hotkey configuration

Configuration

Customize models, audio, output, and text processing

Transcription engines

Explore 7 engines for different languages and use cases

GPU acceleration

Enable Vulkan, CUDA, or ROCm for faster inference

Troubleshooting

Text not appearing

Symptom: Voxtype records but text doesn’t appear at cursor.Solution:

Verify wtype is installed: which wtype
Check daemon logs for output driver errors
Try clipboard mode: voxtype --clipboard
See Output Modes for driver setup

Cannot open input device

Symptom: Error: “Cannot open input device”Solution:

If using compositor keybindings: Set [hotkey] enabled = false in config
If using built-in hotkey: Add yourself to input group: sudo usermod -aG input $USER and log out/in

No audio captured

Symptom: Recording starts but produces empty transcription.Solution:

# List audio sources
pactl list sources short

# Test recording
arecord -d 3 -f S16_LE -r 16000 test.wav
aplay test.wav

# Configure device in config.toml
[audio]
device = "alsa_input.your_device_name"

Model not found

Symptom: Error: “Model file not found”Solution:

# Download the default model
voxtype setup --download

# Or download a specific model
voxtype setup --download --model base.en

# Verify models
ls ~/.local/share/voxtype/

For more troubleshooting, see the Troubleshooting guide.

Community

GitHub: github.com/peteonrails/voxtype
Issues: Report bugs or request features
Discussions: Ask questions

Get Started

Guides

Features

Installation

What’s next?

Basic usage

Configuration

Transcription engines

GPU acceleration

Troubleshooting

Community

Build docs developers (and LLMs) love

Get Started

Guides

Features

​Installation

​What’s next?

Basic usage

Configuration

Transcription engines

GPU acceleration

​Troubleshooting

​Community

Build docs developers (and LLMs) love

Installation

What’s next?

Troubleshooting

Community