What is Voxtype?
Voxtype is a push-to-talk voice-to-text application for Linux that transforms your speech into text with a simple hotkey press. Hold your configured key, speak naturally, release, and watch your words appear at your cursor position.Offline by default
Process speech locally using Whisper and other engines. No internet required.
Wayland native
First-class integration with Hyprland, Sway, and River compositors.
7 transcription engines
Choose from Whisper, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, and Omnilingual.
GPU accelerated
Vulkan, CUDA, and ROCm support for sub-second inference on large models.
How it works
Voxtype uses a simple push-to-talk workflow:- Hold your hotkey - Default is ScrollLock, or configure your compositor to use Super+V
- Speak naturally - Audio is captured from your microphone
- Release the key - Transcription begins using your chosen engine
- Text appears - Output is typed at your cursor, copied to clipboard, or written to a file
Key features
Multiple transcription engines
Choose from 7 different speech-to-text engines optimized for different languages and use cases:- Whisper (default) - 99 languages, excellent accuracy
- Parakeet - Fast English transcription
- Moonshine - Edge devices, low memory
- SenseVoice - Chinese, Japanese, Korean
- Paraformer - Chinese-English bilingual
- Dolphin - 40 languages + Chinese dialects
- Omnilingual - 1600+ languages
Compositor integration
Voxtype integrates natively with Wayland compositors using their keybinding systems. This provides push-to-talk without requiring special permissions:- Hyprland - Full support with submap integration
- Sway - Mode-based integration for clean modifier handling
- River - Native mode support
- X11 - Evdev fallback with input group permission
Compositor integration is recommended over the built-in hotkey for better reliability and no permission requirements.
Flexible output modes
Choose how transcribed text reaches its destination:- Type mode - Simulates keyboard input (wtype, dotool, or ydotool)
- Clipboard mode - Copies text to clipboard
- Paste mode - Copies and simulates Ctrl+V
- File mode - Writes directly to a file
Meeting mode
Record longer sessions with continuous transcription, speaker attribution, and export capabilities:- Chunked processing for long recordings
- Speaker identification and labeling
- Export to Markdown, JSON, SRT, or VTT
- AI summarization with Ollama integration
System requirements
- Minimum
- Recommended
- OS: Linux with glibc 2.35+ (Ubuntu 24.04, Fedora 39, Arch, Debian Trixie)
- Desktop: Wayland or X11
- Audio: PipeWire or PulseAudio
- CPU: x86_64 with AVX2 support
- RAM: 1 GB available
- Disk: 500 MB for base.en model
Use cases
Dictation
Write emails, documents, and code by voice. Faster than typing for long-form content.
Accessibility
Hands-free text input for users with mobility challenges or RSI.
Meeting notes
Record and transcribe meetings with speaker identification and export.
Multilingual
Support for 1600+ languages including CJK, with translation to English.
Design principles
Voxtype is built with these core principles:- Privacy first - All processing happens locally by default
- Wayland native - First-class compositor integration
- Performance matters - GPU acceleration for real-time transcription
- Extensible - Multiple engines, output modes, and post-processing options
- Keyboard-driven - Pure CLI interface, no GUI required
Next steps
Quick start
Get Voxtype running in 5 minutes
Installation
Install on your distribution
Basic usage
Learn push-to-talk controls
Configuration
Customize to your workflow