Voice to text for Linux

Push-to-talk voice transcription with offline processing, GPU acceleration, and native Wayland integration. Transform speech into text instantly without sending data to the cloud.

Get Started Command Reference

terminal

$ voxtype

✓ Daemon started

Model: base.en

Backend: GPU (Vulkan)

●Ready for input…

Quick Start

Get Voxtype running on your Linux system in minutes

Install Voxtype

Download and install Voxtype from the AUR (Arch), or use the provided .deb or .rpm packages for your distribution.

yay -S voxtype

Install dependencies

Voxtype requires a text input driver. Install wtype for best Unicode/CJK support on Wayland.

# Fedora
sudo dnf install wtype

# Arch
sudo pacman -S wtype

# Ubuntu
sudo apt install wtype

Download transcription model

Download a Whisper model for offline transcription. The base.en model provides a good balance of speed and accuracy.

voxtype setup --download

Configure compositor keybinding

Add a keybinding to your compositor for push-to-talk activation.

Hyprland configuration

Add to ~/.config/hypr/hyprland.conf:

bind = SUPER, V, exec, voxtype record start
bindr = SUPER, V, exec, voxtype record stop

Sway configuration

Add to ~/.config/sway/config:

bindsym --no-repeat $mod+v exec voxtype record start
bindsym --release $mod+v exec voxtype record stop

River configuration

Add to ~/.config/river/init:

riverctl map normal Super V spawn 'voxtype record start'
riverctl map -release normal Super V spawn 'voxtype record stop'

Then disable the built-in hotkey in your config:

# ~/.config/voxtype/config.toml
[hotkey]
enabled = false

Start the daemon

Run Voxtype in the foreground to test, or set it up as a systemd service.

# Run in foreground
voxtype

# Or install as systemd service
voxtype setup systemd
systemctl --user enable --now voxtype

Hold your configured hotkey, speak, and release. Transcribed text appears at your cursor position.

Key Features

Everything you need for voice-to-text on Linux

Offline transcription

7 transcription engines including Whisper, Parakeet, and Moonshine. Process speech locally without internet.

GPU acceleration

Vulkan, CUDA, and ROCm support for sub-second inference on large models.

Wayland native

First-class integration with Hyprland, Sway, and River compositors.

Meeting mode

Continuous transcription with speaker attribution and export to Markdown, JSON, SRT, or VTT.

Multiple output modes

Type directly, paste via clipboard, or write to files. Automatic fallback handling.

Text processing

Spoken punctuation, word replacements, and LLM post-processing integration.

Explore by Topic

Dive deeper into Voxtype’s capabilities

Basic Usage

Learn push-to-talk controls, toggle mode, and how to configure your preferred hotkey.

Read guide

Configuration

Customize models, audio settings, output behavior, and post-processing with TOML configuration.

Read guide

Transcription Engines

Choose from 7 engines: Whisper, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, and Omnilingual.

Explore engines

Architecture

Understand Voxtype’s design: hotkey detection, audio capture, transcription backends, and output drivers.

Learn more

Ready to get started?

Install Voxtype on your Linux system and start transcribing speech with push-to-talk voice recognition.

Get Started