Installation Issues
ffmpeg not found
ffmpeg not found
Whisper requires the After installation, verify ffmpeg is accessible:
ffmpeg command-line tool to be installed on your system.Solution: Install ffmpeg using your system’s package manager:tiktoken installation fails
tiktoken installation fails
Whisper depends on tiktoken for fast tokenization. If tiktoken doesn’t provide a pre-built wheel for your platform, you may need Rust installed.Symptoms:
- Installation errors during
pip install - Messages about missing Rust compiler
- Install Rust by following the Getting Started guide
-
Configure your PATH environment variable:
-
If you see
No module named 'setuptools_rust', install it: -
Retry the Whisper installation:
No module named 'setuptools_rust'
No module named 'setuptools_rust'
This error occurs when tiktoken needs to be built from source but setuptools_rust is not installed.Solution:
Runtime Issues
CUDA out of memory / GPU memory error
CUDA out of memory / GPU memory error
This occurs when the selected model requires more VRAM than your GPU has available.VRAM Requirements:
tiny,base: ~1 GBsmall: ~2 GBmedium: ~5 GBturbo: ~6 GBlarge: ~10 GB
-
Use a smaller model:
-
Use CPU instead of GPU:
Note: CPU inference will be significantly slower.
- Close other GPU-intensive applications to free up VRAM
Model downloads fail or are very slow
Model downloads fail or are very slow
Model weights are downloaded from the internet on first use.Solutions:
- Check your internet connection
-
Use a different download location if your home directory has limited space:
-
Respect XDG_CACHE_HOME if set:
Audio file format not supported
Audio file format not supported
Whisper uses ffmpeg to handle audio files. Most formats are supported, but some may cause issues.Solution:Convert your audio file to a widely supported format like WAV or MP3:Then transcribe the converted file:
Empty transcription or no speech detected
Empty transcription or no speech detected
This can happen with audio files that contain no speech or very low-quality audio.Solutions:
-
Verify audio file contains audible speech:
- Check audio levels - audio may be too quiet
- Try a larger model which may be more robust to poor quality audio
-
Specify the language explicitly:
Accuracy Issues
Poor transcription quality
Poor transcription quality
If transcriptions are inaccurate, consider these factors:Solutions:
-
Use a larger model:
-
Specify the language to avoid language detection errors:
-
Check for known limitations:
- Low-resource languages may have higher error rates
- Background noise affects accuracy
- Multiple speakers or crosstalk reduce quality
-
Improve audio quality:
- Remove background noise
- Use higher bitrate audio
- Ensure clear speech without overlapping speakers
Hallucinations - model generates text not in audio
Hallucinations - model generates text not in audio
The model may generate plausible-sounding text that wasn’t actually spoken.Why it happens:
Models are trained on large-scale noisy data and may combine language modeling with transcription.Mitigation strategies:
-
Use beam search and temperature scheduling (already enabled by default in
transcribe()) - Use larger models which tend to hallucinate less
-
Enable word-level timestamps to identify suspicious sections:
- Be especially cautious with low-resource languages where hallucinations are more common
Repetitive text in output
Repetitive text in output
The sequence-to-sequence architecture can sometimes generate repetitive text.Solutions:
-
Adjust temperature settings:
-
Use condition_on_previous_text parameter:
- Try a different model size - sometimes smaller or larger models perform better on specific audio
Translation not working (turbo model)
Translation not working (turbo model)
The
turbo model is not trained for translation tasks.Symptoms:- Using
--task translatewith--model turboreturns original language instead of English
medium or large) for translation:Platform-Specific Issues
Python version compatibility
Python version compatibility
Whisper requires Python 3.8 or newer.Check your Python version:Solution:
If your Python version is too old, upgrade to Python 3.8, 3.9, 3.10, 3.11, or 3.12.
PyTorch compatibility issues
PyTorch compatibility issues
Whisper is tested with PyTorch 1.10.1 and later versions.Solution:
Update PyTorch to a recent version:For GPU support, follow PyTorch installation instructions for your platform.
Getting Help
If you encounter an issue not covered here:- Check existing issues on GitHub
- Search discussions in the repository
- Create a new issue with:
- Your Python and PyTorch versions
- Full error message and stack trace
- Minimal code to reproduce the issue
- Information about your system (OS, GPU if applicable)