Overview
Performs offline speech recognition using OpenAI’s Whisper model running locally on your machine. No internet connection or API key required.Method Signature
Parameters
The audio data to recognize. Must be an
AudioData instance.Whisper model size to use. Options:
"tiny"- Smallest, fastest, least accurate (~1GB RAM)"base"- Good balance of speed and accuracy (~1GB RAM)"small"- Better accuracy (~2GB RAM)"medium"- High accuracy (~5GB RAM)"large"- Best accuracy (~10GB RAM)"large-v2"- Improved large model"large-v3"- Latest large model
If
True, returns the full result dictionary including detected language, segments, and timing. If False, returns only the transcription text.Optional parameters for loading the model:
device: Device to use ("cpu","cuda", ortorch.deviceobject)download_root: Directory to download models toin_memory: Whether to load model in memory
Recognition language as a full language name (lowercase):
"english", "spanish", "french", "german", "chinese", etc.If not specified, Whisper will automatically detect the language.See Whisper language list for all supported languages."transcribe"- Transcribe audio in its original language"translate"- Transcribe and translate to English
Sampling temperature for generation. Can be:
- Single float value
- Tuple of temperatures to try (falls back if generation fails)
Whether to use FP16 precision. Automatically enabled if CUDA is available.
Returns
The transcribed text when
show_dict=FalseFull transcription result when
show_dict=True, containing:text: Complete transcriptionsegments: List of segments with timing and textlanguage: Detected language code
Exceptions
Raised when:
- The
whispermodule is not installed - Model download fails
- Insufficient memory for the model
Example Usage
Basic Local Recognition
Using Different Model Sizes
With Language Specification
Automatic Language Detection
Translation to English
With Segment Timing Information
Using GPU Acceleration
From Audio File
Custom Temperature Settings
Installation
Basic Installation
With GPU Support (NVIDIA)
System Requirements
- Python: 3.8 or later
- RAM:
- Tiny/Base: 1GB
- Small: 2GB
- Medium: 5GB
- Large: 10GB
- GPU (optional): NVIDIA GPU with CUDA for faster processing
Available Models
| Model | Parameters | RAM Required | Relative Speed |
|---|---|---|---|
| tiny | 39M | ~1GB | ~32x |
| base | 74M | ~1GB | ~16x |
| small | 244M | ~2GB | ~6x |
| medium | 769M | ~5GB | ~2x |
| large | 1550M | ~10GB | 1x |
Language Support
Whisper supports 99 languages including:english,spanish,french,german,italianportuguese,dutch,russian,polishchinese,japanese,koreanarabic,turkish,vietnamesehindi,indonesian,thai- And many more…
Notes
- Works completely offline (no internet required after model download)
- Models are cached after first download
- GPU significantly speeds up transcription (10-30x faster)
- Larger models are more accurate but slower
- Language auto-detection works well but specifying language improves accuracy
- The
translatetask always outputs English text - Word-level timestamps available in
show_dict=Truemode