Basic Usage
Voxtype is a push-to-talk voice-to-text tool for Linux. This guide covers the essential workflows for daily use.The Push-to-Talk Workflow
- Start the daemon: Run
voxtypein a terminal or enable the systemd service - Hold your hotkey: Default is ScrollLock
- Speak clearly: Talk at a normal pace
- Release the hotkey: Your speech is transcribed
- Text appears: Either typed at cursor or copied to clipboard
Running the Daemon
Foreground Mode
Run the daemon directly in a terminal:Systemd Service
For automatic startup on login:Push-to-Talk vs Toggle Mode
Voxtype supports two activation modes:Push-to-Talk (Default)
Hold the hotkey to record, release to transcribe:Toggle Mode
Press once to start recording, press again to stop:Using the Built-in Hotkey
The built-in hotkey uses evdev (Linux input subsystem) to detect key presses. This works on both Wayland and X11.Prerequisites
You must be in theinput group:
Default Hotkey
The default hotkey is ScrollLock. This is chosen because:- It’s rarely used for other purposes
- It doesn’t interfere with normal typing
- It’s available on most keyboards
Changing the Hotkey
Edit~/.config/voxtype/config.toml:
Using Compositor Keybindings
Instead of the built-in evdev hotkey, you can use your compositor’s native keybindings. This has several advantages:- No
inputgroup membership required - Use any key combination (e.g., Super+V)
- Native feel with familiar keybinding configuration
Setup
- Disable the built-in hotkey in
~/.config/voxtype/config.toml:
- Configure your compositor. See the Compositor Integration guide for detailed setup instructions.
Canceling Recording or Transcription
You can cancel an active recording or transcription in progress without outputting any text.With Built-in Hotkey
Configure a cancel key in your config:ESC- Escape keyBACKSPACE- Backspace keyF12- Function key
With Compositor Keybindings
Bind a key to the cancel command:Model Selection and Switching
Voxtype supports multiple Whisper models with different trade-offs between speed and accuracy.Default Model
The default model isbase.en, which provides a good balance:
Available Models
| Model | Size | Speed | Accuracy | Languages |
|---|---|---|---|---|
tiny.en | 39 MB | Fastest | Good | English only |
base.en | 142 MB | Fast | Better | English only |
small.en | 466 MB | Medium | Great | English only |
medium.en | 1.5 GB | Slow | Excellent | English only |
large-v3 | 3.1 GB | Slowest | Best | 99 languages |
large-v3-turbo | 1.6 GB | Fast | Excellent | 99 languages (GPU recommended) |
Switching Models
Change the model in your config:Interactive Model Selection
Use the setup command for guided selection:- Show all available models
- Download your selection if needed
- Update your config file
- Optionally restart the daemon
Multi-Model Support
You can configure multiple models and switch between them:Audio Feedback
Enable audio cues when recording starts and stops:Built-in Themes
- default - Clear, pleasant two-tone beeps
- subtle - Quiet, unobtrusive clicks
- mechanical - Typewriter/keyboard-like sounds
Custom Themes
Point to a directory containingstart.wav, stop.wav, and error.wav:
Example Session
Common CLI Options
| Option | Description |
|---|---|
-v, -vv | Increase verbosity (debug, trace) |
-q, --quiet | Quiet mode (errors only) |
--clipboard | Force clipboard mode |
--paste | Force paste mode (clipboard + Ctrl+V) |
--model <MODEL> | Override transcription model |
--engine <ENGINE> | Override transcription engine (whisper, parakeet, moonshine, etc.) |
--hotkey <KEY> | Override hotkey |
--toggle | Use toggle mode |
--no-hotkey | Disable built-in hotkey (use compositor keybindings) |
-c, --config <FILE> | Use custom config file |
Next Steps
Compositor Integration
Set up push-to-talk with Hyprland, Sway, or River
Configuration
Customize hotkeys, models, and output settings
Output Modes
Choose between typing, clipboard, paste, and file output
Text Processing
Post-process transcriptions with LLMs and word replacements