Introduction

What is Voxtype?

Voxtype is a push-to-talk voice-to-text application for Linux that transforms your speech into text with a simple hotkey press. Hold your configured key, speak naturally, release, and watch your words appear at your cursor position.

Offline by default

Process speech locally using Whisper and other engines. No internet required.

Wayland native

First-class integration with Hyprland, Sway, and River compositors.

7 transcription engines

Choose from Whisper, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, and Omnilingual.

GPU accelerated

Vulkan, CUDA, and ROCm support for sub-second inference on large models.

How it works

Voxtype uses a simple push-to-talk workflow:

Hold your hotkey - Default is ScrollLock, or configure your compositor to use Super+V
Speak naturally - Audio is captured from your microphone
Release the key - Transcription begins using your chosen engine
Text appears - Output is typed at your cursor, copied to clipboard, or written to a file

The entire process happens locally on your machine by default, with no data sent to external servers.

Key features

Multiple transcription engines

Choose from 7 different speech-to-text engines optimized for different languages and use cases:

Whisper (default) - 99 languages, excellent accuracy
Parakeet - Fast English transcription
Moonshine - Edge devices, low memory
SenseVoice - Chinese, Japanese, Korean
Paraformer - Chinese-English bilingual
Dolphin - 40 languages + Chinese dialects
Omnilingual - 1600+ languages

See Transcription Engines for a complete comparison and recommendations.

Compositor integration

Voxtype integrates natively with Wayland compositors using their keybinding systems. This provides push-to-talk without requiring special permissions:

Hyprland - Full support with submap integration
Sway - Mode-based integration for clean modifier handling
River - Native mode support
X11 - Evdev fallback with input group permission

Compositor integration is recommended over the built-in hotkey for better reliability and no permission requirements.

Flexible output modes

Choose how transcribed text reaches its destination:

Type mode - Simulates keyboard input (wtype, dotool, or ydotool)
Clipboard mode - Copies text to clipboard
Paste mode - Copies and simulates Ctrl+V
File mode - Writes directly to a file

Each mode supports automatic fallback if the primary method fails.

Meeting mode

Record longer sessions with continuous transcription, speaker attribution, and export capabilities:

Chunked processing for long recordings
Speaker identification and labeling
Export to Markdown, JSON, SRT, or VTT
AI summarization with Ollama integration

System requirements

Minimum
Recommended

OS: Linux with glibc 2.35+ (Ubuntu 24.04, Fedora 39, Arch, Debian Trixie)
Desktop: Wayland or X11
Audio: PipeWire or PulseAudio
CPU: x86_64 with AVX2 support
RAM: 1 GB available
Disk: 500 MB for base.en model

Use cases

Dictation

Write emails, documents, and code by voice. Faster than typing for long-form content.

Accessibility

Hands-free text input for users with mobility challenges or RSI.

Meeting notes

Record and transcribe meetings with speaker identification and export.

Multilingual

Support for 1600+ languages including CJK, with translation to English.

Design principles

Voxtype is built with these core principles:

Privacy first - All processing happens locally by default
Wayland native - First-class compositor integration
Performance matters - GPU acceleration for real-time transcription
Extensible - Multiple engines, output modes, and post-processing options
Keyboard-driven - Pure CLI interface, no GUI required

Next steps

Quick start

Get Voxtype running in 5 minutes

Installation

Install on your distribution

Basic usage

Learn push-to-talk controls

Configuration

Customize to your workflow

Get Started

Guides

Features

What is Voxtype?

Offline by default

Wayland native

7 transcription engines

GPU accelerated

How it works

Key features

Multiple transcription engines

Compositor integration

Flexible output modes

Meeting mode

System requirements

Use cases

Dictation

Accessibility

Meeting notes

Multilingual

Design principles

Next steps

Quick start

Installation

Basic usage

Configuration

Build docs developers (and LLMs) love

Get Started

Guides

Features

​What is Voxtype?

Offline by default

Wayland native

7 transcription engines

GPU accelerated

​How it works

​Key features

​Multiple transcription engines

​Compositor integration

​Flexible output modes

​Meeting mode

​System requirements

​Use cases

Dictation

Accessibility

Meeting notes

Multilingual

​Design principles

​Next steps

Quick start

Installation

Basic usage

Configuration

Build docs developers (and LLMs) love

What is Voxtype?

How it works

Key features

Multiple transcription engines

Compositor integration

Flexible output modes

Meeting mode

System requirements

Use cases

Design principles

Next steps