Function Signature
Parameters
One of the official model names listed by
whisper.available_models(), or path to a model checkpoint containing the model dimensions and the model state_dict.Available official models:tiny.en,tinybase.en,basesmall.en,smallmedium.en,mediumlarge-v1,large-v2,large-v3,largelarge-v3-turbo,turbo
.en suffix are English-only variants.The PyTorch device to put the model into. If not specified, automatically selects
"cuda" if available, otherwise "cpu".Common values: "cuda", "cpu", "cuda:0", torch.device("cuda")Path to download the model files. By default, uses
"~/.cache/whisper" (or $XDG_CACHE_HOME/whisper if the environment variable is set).The function creates this directory if it doesn’t exist.Whether to preload the model weights into host memory. When
True, the model checkpoint is kept in memory as bytes rather than being read from disk.This can be useful for deployment scenarios where filesystem access is limited.Returns
The Whisper ASR model instance, ready for inference. The model is already loaded onto the specified device and has its state_dict loaded.The returned model has alignment heads set if it’s an official model (used for word-level timestamps).
Example
Notes
Model Download and Caching
- On first use, models are downloaded from Azure CDN to the cache directory
- Downloaded models are verified using SHA256 checksums
- Subsequent calls reuse the cached model files
- If checksum verification fails, the model is re-downloaded automatically
Model Selection
Choose a model based on your requirements:- Speed:
tiny(fastest) →turbo→base→small→medium→large(slowest) - Accuracy:
tiny(lowest) →base→small→medium→large→turbo(highest) - English-only: Use
.envariants for better English performance - Multilingual: Use non-
.enmodels for 99+ languages
Device Compatibility
- FP16 (half precision) is only supported on CUDA devices
- CPU inference automatically uses FP32 even if FP16 is requested
- The model returns a warning if performing inference on CPU when CUDA is available
Error Handling
Alignment Heads
Official models automatically have alignment heads configured, which are used for:- Word-level timestamp extraction
- Cross-attention pattern analysis
- Dynamic time warping for precise timing