Available Models
Relative speeds are measured by transcribing English speech on an A100 GPU. Real-world performance varies based on language, speaking speed, and hardware.
Model Specifications
| Size | Parameters | English-only | Multilingual | Required VRAM | Relative Speed |
|---|---|---|---|---|---|
| tiny | 39 M | tiny.en | tiny | ~1 GB | ~10x |
| base | 74 M | base.en | base | ~1 GB | ~7x |
| small | 244 M | small.en | small | ~2 GB | ~4x |
| medium | 769 M | medium.en | medium | ~5 GB | ~2x |
| large | 1550 M | N/A | large | ~10 GB | 1x |
| turbo | 809 M | N/A | turbo | ~6 GB | ~8x |
Model Variants
Tiny (39M parameters)
Best for: Real-time applications with limited resources
- VRAM: ~1 GB
- Speed: 10x faster than large
- Models:
tiny.en(English),tiny(Multilingual)
Base (74M parameters)
Best for: Fast transcription with acceptable accuracy
- VRAM: ~1 GB
- Speed: 7x faster than large
- Models:
base.en(English),base(Multilingual)
Small (244M parameters)
Best for: Balanced performance and resource usage
- VRAM: ~2 GB
- Speed: 4x faster than large
- Models:
small.en(English),small(Multilingual)
Medium (769M parameters)
Best for: High accuracy with moderate speed
- VRAM: ~5 GB
- Speed: 2x faster than large
- Models:
medium.en(English),medium(Multilingual)
Large (1550M parameters)
Best for: Maximum accuracy, translation tasks
- VRAM: ~10 GB
- Speed: Baseline (1x)
- Models:
large(Multilingual only) - Versions:
large-v1,large-v2,large-v3
The
large model alias points to large-v3, the latest version.Turbo (809M parameters)
Best for: Fast, accurate transcription (default model)
- VRAM: ~6 GB
- Speed: 8x faster than large
- Models:
turbo(Multilingual only) - Based on: Optimized
large-v3
English-only vs Multilingual
When to Use English-only Models
Better Performance on English
Better Performance on English
The
.en models perform better on English audio, especially for tiny.en and base.en. The difference becomes less significant for larger models.Available Sizes
Available Sizes
English-only models are available for: tiny, base, small, and medium sizes.
When to Use Multilingual Models
Required for:
- Non-English transcription
- Translation to English
- Language identification
- Multilingual applications
Model Selection Guide
Loading Models
Basic Loading
With Device Selection
Custom Download Location
Available Models
Model Versions
The large model has multiple versions with improvements:- large-v1: Original large model
- large-v2: Improved accuracy
- large-v3: Latest version with best accuracy
- turbo: Optimized large-v3 for speed