Skip to main content
Get Voxtype installed and transcribing speech in just a few steps.

Installation

1

Install Voxtype

Choose your distribution’s package manager or install from source.
yay -S voxtype
# or
yay -S voxtype-bin
See the Installation page for all options including GPU acceleration variants.
2

Install text input driver

Voxtype needs a way to output text. Install wtype for best Unicode/CJK support on Wayland.
sudo dnf install wtype wl-clipboard
wtype works on most Wayland compositors. For KDE/GNOME, install dotool or ydotool instead. See Output Modes for details.
3

Download transcription model

Voxtype uses Whisper models for offline speech recognition. Download the base.en model for a good balance of speed and accuracy.
voxtype setup --download
This downloads the default model (~142 MB) to ~/.local/share/voxtype/.
ModelSizeAccuracySpeed
tiny.en39 MB~10% WERFastest
base.en142 MB~8% WERFast
small.en466 MB~6% WERMedium
medium.en1.5 GB~5% WERSlow
large-v3-turbo1.6 GB~4% WERFast (with GPU)
Use voxtype setup model for interactive selection or voxtype setup --download --model <name> to download a specific model.
4

Configure compositor keybinding

The best way to use Voxtype is with your compositor’s native keybindings. This provides push-to-talk without special permissions.
Add to ~/.config/hypr/hyprland.conf:
bind = SUPER, V, exec, voxtype record start
bindr = SUPER, V, exec, voxtype record stop
Reload config: hyprctl reload
Then disable the built-in hotkey to avoid conflicts:
mkdir -p ~/.config/voxtype
cat > ~/.config/voxtype/config.toml << 'EOF'
[hotkey]
enabled = false
EOF
If using compositor keybindings, you must set enabled = false to disable the built-in hotkey.
5

Start the daemon

Run Voxtype to start the transcription daemon.
voxtype
Keeps the daemon running in your terminal. Press Ctrl+C to stop.
You should see output like:
[INFO] Loading model: base.en
[INFO] Model loaded successfully
[INFO] Daemon started, waiting for hotkey press...
6

Test voice input

Now test push-to-talk:
  1. Click in any text field (browser, text editor, terminal)
  2. Hold Super+V (or your configured hotkey)
  3. Speak clearly: “This is a test of voice to text”
  4. Release the key
After a moment, the transcribed text should appear at your cursor.
The first transcription may take a few seconds while the model loads. Subsequent transcriptions are much faster.

What’s next?

Basic usage

Learn push-to-talk, toggle mode, and hotkey configuration

Configuration

Customize models, audio, output, and text processing

Transcription engines

Explore 7 engines for different languages and use cases

GPU acceleration

Enable Vulkan, CUDA, or ROCm for faster inference

Troubleshooting

Symptom: Voxtype records but text doesn’t appear at cursor.Solution:
  1. Verify wtype is installed: which wtype
  2. Check daemon logs for output driver errors
  3. Try clipboard mode: voxtype --clipboard
  4. See Output Modes for driver setup
Symptom: Error: “Cannot open input device”Solution:
  • If using compositor keybindings: Set [hotkey] enabled = false in config
  • If using built-in hotkey: Add yourself to input group: sudo usermod -aG input $USER and log out/in
Symptom: Recording starts but produces empty transcription.Solution:
# List audio sources
pactl list sources short

# Test recording
arecord -d 3 -f S16_LE -r 16000 test.wav
aplay test.wav

# Configure device in config.toml
[audio]
device = "alsa_input.your_device_name"
Symptom: Error: “Model file not found”Solution:
# Download the default model
voxtype setup --download

# Or download a specific model
voxtype setup --download --model base.en

# Verify models
ls ~/.local/share/voxtype/
For more troubleshooting, see the Troubleshooting guide.

Community

Build docs developers (and LLMs) love