Why GPU Acceleration Matters
Without GPU acceleration, transcription speed is limited by CPU performance. For short dictation this is usually fine, but longer recordings or large models can take several seconds to transcribe. With GPU acceleration:- Large models become practical:
large-v3-turboruns in under a second instead of 15+ seconds - Better battery life: GPU is more power-efficient than maxing out CPU
- Lower latency: Sub-second transcription feels instant
Performance Comparison
Example on AMD RX 6800:| Model | CPU (8-core) | Vulkan GPU |
|---|---|---|
| base.en | ~7x realtime | ~35x realtime |
| large-v3 | ~1x realtime | ~5x realtime |
- CPU:
base.en~0.7s,large-v3~5s - GPU:
base.en~0.14s,large-v3~1s
Supported Backends
Voxtype supports multiple GPU backends depending on how you build or install it.| Backend | Hardware | Packages | Build Flag |
|---|---|---|---|
| Vulkan | AMD, NVIDIA, Intel | Included in packages | --features gpu-vulkan |
| CUDA | NVIDIA | Build from source | --features gpu-cuda |
| Metal | Apple Silicon | Build from source | --features gpu-metal |
| ROCm/HIP | AMD | Build from source | --features gpu-hipblas |
Vulkan (Recommended)
Vulkan works across all major GPU vendors and is included in the official packages. Supported GPUs:- AMD Radeon RX 5000+, Vega
- NVIDIA GTX 900+, RTX series
- Intel Arc, Iris Xe
- Cross-platform: Works on AMD, NVIDIA, and Intel
- Included in packages: No need to build from source
- Good compatibility: Broad driver support
CUDA (NVIDIA)
CUDA provides the best performance on NVIDIA GPUs but requires building from source. Supported GPUs: NVIDIA GTX 900+, RTX series, Tesla, Quadro Advantages:- Fastest on NVIDIA hardware
- Most mature GPU acceleration in whisper.cpp
Metal (macOS)
Metal is the native GPU API for macOS and Apple Silicon. Supported hardware: M1, M2, M3 chips Building with Metal:ROCm/HIP (AMD)
ROCm is AMD’s alternative to CUDA for compute workloads. Supported GPUs: AMD Radeon RX 5000+, Vega, Instinct Building with ROCm:Enabling GPU Acceleration
Using Packages (Vulkan)
The packaged binaries include a Vulkan-enabled variant. Use the setup tool to switch:Manual Binary Selection
If you downloaded binaries manually:Building from Source
Verifying GPU Acceleration
Check the daemon logs to confirm GPU is active:backend field:
GPU Memory Management
GPU-accelerated models load into VRAM and stay resident by default for fast repeated transcriptions.On-Demand Loading
Free GPU memory when not transcribing:GPU Isolation Mode
Run transcription in a subprocess that exits after each recording, fully releasing GPU memory:- Laptops with hybrid graphics (discrete GPU can power down)
- Battery life is a priority
- Running multiple GPU applications
large-v3-turbo:
| Mode | Transcription Latency | Idle RAM | Idle GPU Memory |
|---|---|---|---|
| Standard | 0.49s avg | ~1.6 GB | 409 MB |
| GPU Isolation | 0.50s avg | 0 | 0 |
Choosing Memory Settings
| Scenario | Recommendation |
|---|---|
| Desktop with dedicated GPU | Default (keep model loaded) |
| Laptop with discrete GPU | gpu_isolation = true |
| Limited VRAM | on_demand_loading = true |
| Running other GPU apps | gpu_isolation = true or on_demand_loading = true |
| Want instant response | Default |
Multi-GPU Systems
If you have multiple GPUs, you can select which one to use.AMD (Vulkan)
Set theVOXTYPE_VULKAN_DEVICE environment variable:
~/.config/voxtype/voxtype.env (if using systemd service).
NVIDIA (CUDA)
CUDA automatically selects GPU 0. To use a different GPU:Troubleshooting
GPU not detected
Symptom: Logs show “CPU” backend instead of “Vulkan” or “CUDA” Solutions:-
Verify you’re using a GPU-enabled binary:
-
Check Vulkan drivers are installed:
-
Verify GPU is accessible:
Poor GPU performance
Check for CPU fallback: If the GPU backend fails to initialize, whisper.cpp silently falls back to CPU. View full logs:- Old GPU drivers
- Insufficient VRAM (need ~2GB for large models)
- GPU in use by another application
”Out of memory” errors
Reduce model size or enable memory management:Hybrid graphics laptop issues
Ensure discrete GPU is active:~/.config/systemd/user/voxtype.service.d/override.conf:
Benchmarking
Test transcription speed with a sample file:ONNX Engine GPU Support
The ONNX engines (Parakeet, Moonshine, SenseVoice, etc.) also support GPU acceleration via ONNX Runtime.CUDA-accelerated ONNX
Download the CUDA-enabled ONNX binary:ROCm-accelerated ONNX
Performance
ONNX engines with GPU acceleration:| Engine | CPU | CUDA GPU |
|---|---|---|
| Parakeet TDT | ~30x realtime | ~80x realtime |
| Moonshine base | ~40x realtime | ~120x realtime |
Best Practices
- Use Vulkan for most cases: Works across AMD, NVIDIA, Intel with good performance
- CUDA for NVIDIA: Slightly faster if you’re building from source
- Enable GPU isolation on laptops: Saves battery when not transcribing
- Use large models with GPU: Take advantage of the speed to get better accuracy
- Test before committing: Try GPU acceleration with your hardware to ensure it helps
Further Reading
- Transcription Engines - Choose the right engine
- Configuration guide - GPU settings documentation
- Troubleshooting Guide - Common GPU issues