Troubleshooting

This page covers common issues you may encounter when installing or using Whisper, along with their solutions.

Installation Issues

ffmpeg not found

Whisper requires the ffmpeg command-line tool to be installed on your system.Solution: Install ffmpeg using your system’s package manager:

# Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# Arch Linux
sudo pacman -S ffmpeg

# MacOS using Homebrew
brew install ffmpeg

# Windows using Chocolatey
choco install ffmpeg

# Windows using Scoop
scoop install ffmpeg

After installation, verify ffmpeg is accessible:

ffmpeg -version

tiktoken installation fails

Whisper depends on tiktoken for fast tokenization. If tiktoken doesn’t provide a pre-built wheel for your platform, you may need Rust installed.Symptoms:

Installation errors during pip install
Messages about missing Rust compiler

Solution:

Install Rust by following the Getting Started guide
Configure your PATH environment variable:
```
export PATH="$HOME/.cargo/bin:$PATH"
```
If you see No module named 'setuptools_rust', install it:
```
pip install setuptools-rust
```
Retry the Whisper installation:
```
pip install -U openai-whisper
```

No module named 'setuptools_rust'

This error occurs when tiktoken needs to be built from source but setuptools_rust is not installed.Solution:

pip install setuptools-rust
pip install -U openai-whisper

Runtime Issues

CUDA out of memory / GPU memory error

This occurs when the selected model requires more VRAM than your GPU has available.VRAM Requirements:

tiny, base: ~1 GB
small: ~2 GB
medium: ~5 GB
turbo: ~6 GB
large: ~10 GB

Solutions:

Use a smaller model:

# Instead of:
model = whisper.load_model("large")

# Try:
model = whisper.load_model("small")

Use CPU instead of GPU:
```
model = whisper.load_model("medium", device="cpu")
```
Note: CPU inference will be significantly slower.
Close other GPU-intensive applications to free up VRAM

Model downloads fail or are very slow

Model weights are downloaded from the internet on first use.Solutions:

Check your internet connection

Use a different download location if your home directory has limited space:

model = whisper.load_model("medium", download_root="/path/to/custom/location")

Respect XDG_CACHE_HOME if set:
```
export XDG_CACHE_HOME="/path/to/cache"
```

Audio file format not supported

Whisper uses ffmpeg to handle audio files. Most formats are supported, but some may cause issues.Solution:Convert your audio file to a widely supported format like WAV or MP3:

ffmpeg -i input.audio output.wav

Then transcribe the converted file:

whisper output.wav

Empty transcription or no speech detected

This can happen with audio files that contain no speech or very low-quality audio.Solutions:

Verify audio file contains audible speech:
```
ffplay your-audio.mp3
```
Check audio levels - audio may be too quiet
Try a larger model which may be more robust to poor quality audio
Specify the language explicitly:
```
whisper audio.mp3 --language English
```

Accuracy Issues

Poor transcription quality

If transcriptions are inaccurate, consider these factors:Solutions:

Use a larger model:
```
whisper audio.mp3 --model large
```
Specify the language to avoid language detection errors:
```
whisper audio.mp3 --language Japanese
```
Check for known limitations:
- Low-resource languages may have higher error rates
- Background noise affects accuracy
- Multiple speakers or crosstalk reduce quality
Improve audio quality:
- Remove background noise
- Use higher bitrate audio
- Ensure clear speech without overlapping speakers

Hallucinations - model generates text not in audio

The model may generate plausible-sounding text that wasn’t actually spoken.Why it happens: Models are trained on large-scale noisy data and may combine language modeling with transcription.Mitigation strategies:

Use beam search and temperature scheduling (already enabled by default in transcribe())
Use larger models which tend to hallucinate less

Enable word-level timestamps to identify suspicious sections:

result = model.transcribe("audio.mp3", word_timestamps=True)

Be especially cautious with low-resource languages where hallucinations are more common

Repetitive text in output

The sequence-to-sequence architecture can sometimes generate repetitive text.Solutions:

Adjust temperature settings:

result = model.transcribe("audio.mp3", temperature=0.2)

Use condition_on_previous_text parameter:

result = model.transcribe("audio.mp3", condition_on_previous_text=False)

Try a different model size - sometimes smaller or larger models perform better on specific audio

Translation not working (turbo model)

The turbo model is not trained for translation tasks.Symptoms:

Using --task translate with --model turbo returns original language instead of English

Solution:Use a multilingual model (medium or large) for translation:

# Don't use turbo for translation
whisper japanese.wav --model medium --language Japanese --task translate

The turbo model will return the original language even if --task translate is specified.

Platform-Specific Issues

Python version compatibility

Whisper requires Python 3.8 or newer.Check your Python version:

python --version

Solution: If your Python version is too old, upgrade to Python 3.8, 3.9, 3.10, 3.11, or 3.12.

PyTorch compatibility issues

Whisper is tested with PyTorch 1.10.1 and later versions.Solution: Update PyTorch to a recent version:

pip install --upgrade torch

For GPU support, follow PyTorch installation instructions for your platform.

Getting Help

If you encounter an issue not covered here:

Check existing issues on GitHub
Search discussions in the repository
Create a new issue with:
- Your Python and PyTorch versions
- Full error message and stack trace
- Minimal code to reproduce the issue
- Information about your system (OS, GPU if applicable)

Get Started

Core Concepts

Guides

Resources

Troubleshooting

Installation Issues

Runtime Issues

Accuracy Issues

Platform-Specific Issues

Getting Help

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Resources

​Installation Issues

​Runtime Issues

​Accuracy Issues

​Platform-Specific Issues

​Getting Help

Build docs developers (and LLMs) love

Installation Issues

Runtime Issues

Accuracy Issues

Platform-Specific Issues

Getting Help