Skip to main content

Voice to text for Linux

Push-to-talk voice transcription with offline processing, GPU acceleration, and native Wayland integration. Transform speech into text instantly without sending data to the cloud.

terminal
$ voxtype
✓ Daemon started
Model: base.en
Backend: GPU (Vulkan)
Ready for input…

Quick Start

Get Voxtype running on your Linux system in minutes

1

Install Voxtype

Download and install Voxtype from the AUR (Arch), or use the provided .deb or .rpm packages for your distribution.
yay -S voxtype
2

Install dependencies

Voxtype requires a text input driver. Install wtype for best Unicode/CJK support on Wayland.
# Fedora
sudo dnf install wtype

# Arch
sudo pacman -S wtype

# Ubuntu
sudo apt install wtype
3

Download transcription model

Download a Whisper model for offline transcription. The base.en model provides a good balance of speed and accuracy.
voxtype setup --download
4

Configure compositor keybinding

Add a keybinding to your compositor for push-to-talk activation.
Add to ~/.config/hypr/hyprland.conf:
bind = SUPER, V, exec, voxtype record start
bindr = SUPER, V, exec, voxtype record stop
Add to ~/.config/sway/config:
bindsym --no-repeat $mod+v exec voxtype record start
bindsym --release $mod+v exec voxtype record stop
Add to ~/.config/river/init:
riverctl map normal Super V spawn 'voxtype record start'
riverctl map -release normal Super V spawn 'voxtype record stop'
Then disable the built-in hotkey in your config:
# ~/.config/voxtype/config.toml
[hotkey]
enabled = false
5

Start the daemon

Run Voxtype in the foreground to test, or set it up as a systemd service.
# Run in foreground
voxtype

# Or install as systemd service
voxtype setup systemd
systemctl --user enable --now voxtype
Hold your configured hotkey, speak, and release. Transcribed text appears at your cursor position.

Key Features

Everything you need for voice-to-text on Linux

Offline transcription

7 transcription engines including Whisper, Parakeet, and Moonshine. Process speech locally without internet.

GPU acceleration

Vulkan, CUDA, and ROCm support for sub-second inference on large models.

Wayland native

First-class integration with Hyprland, Sway, and River compositors.

Meeting mode

Continuous transcription with speaker attribution and export to Markdown, JSON, SRT, or VTT.

Multiple output modes

Type directly, paste via clipboard, or write to files. Automatic fallback handling.

Text processing

Spoken punctuation, word replacements, and LLM post-processing integration.

Ready to get started?

Install Voxtype on your Linux system and start transcribing speech with push-to-talk voice recognition.

Get Started