Model Sizes

Whisper offers six model sizes with different tradeoffs between speed, accuracy, and resource requirements. Four models have English-only versions optimized for English speech.

Available Models

Relative speeds are measured by transcribing English speech on an A100 GPU. Real-world performance varies based on language, speaking speed, and hardware.

Model Specifications

Size	Parameters	English-only	Multilingual	Required VRAM	Relative Speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~10x
base	74 M	`base.en`	`base`	~1 GB	~7x
small	244 M	`small.en`	`small`	~2 GB	~4x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x
turbo	809 M	N/A	`turbo`	~6 GB	~8x

Model Variants

Tiny (39M parameters)

Best for: Real-time applications with limited resources

VRAM: ~1 GB
Speed: 10x faster than large
Models: tiny.en (English), tiny (Multilingual)

model = whisper.load_model("tiny")
# or for English-only
model = whisper.load_model("tiny.en")

Base (74M parameters)

Best for: Fast transcription with acceptable accuracy

VRAM: ~1 GB
Speed: 7x faster than large
Models: base.en (English), base (Multilingual)

model = whisper.load_model("base")
# or for English-only
model = whisper.load_model("base.en")

Small (244M parameters)

Best for: Balanced performance and resource usage

VRAM: ~2 GB
Speed: 4x faster than large
Models: small.en (English), small (Multilingual)

model = whisper.load_model("small")
# or for English-only
model = whisper.load_model("small.en")

Medium (769M parameters)

Best for: High accuracy with moderate speed

VRAM: ~5 GB
Speed: 2x faster than large
Models: medium.en (English), medium (Multilingual)

model = whisper.load_model("medium")
# or for English-only
model = whisper.load_model("medium.en")

Large (1550M parameters)

Best for: Maximum accuracy, translation tasks

VRAM: ~10 GB
Speed: Baseline (1x)
Models: large (Multilingual only)
Versions: large-v1, large-v2, large-v3

model = whisper.load_model("large")
# or specific version
model = whisper.load_model("large-v3")

The large model alias points to large-v3, the latest version.

Turbo (809M parameters)

Best for: Fast, accurate transcription (default model)

VRAM: ~6 GB
Speed: 8x faster than large
Models: turbo (Multilingual only)
Based on: Optimized large-v3

model = whisper.load_model("turbo")

The turbo model is not trained for translation tasks. Use medium or large models for translating speech to English.

English-only vs Multilingual

When to Use English-only Models

Better Performance on English

The .en models perform better on English audio, especially for tiny.en and base.en. The difference becomes less significant for larger models.

Available Sizes

English-only models are available for: tiny, base, small, and medium sizes.

When to Use Multilingual Models

Required for:

Non-English transcription
Translation to English
Language identification
Multilingual applications

Model Selection Guide

Loading Models

Basic Loading

import whisper

model = whisper.load_model("turbo")

With Device Selection

import whisper

# Use specific GPU
model = whisper.load_model("turbo", device="cuda:0")

# Use CPU
model = whisper.load_model("turbo", device="cpu")

Custom Download Location

import whisper

model = whisper.load_model(
    "turbo",
    download_root="/path/to/models"
)

Available Models

import whisper

# Get list of all available models
models = whisper.available_models()
print(models)
# ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 
#  'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 
#  'large', 'large-v3-turbo', 'turbo']

Model Versions

The large model has multiple versions with improvements:

large-v1: Original large model
large-v2: Improved accuracy
large-v3: Latest version with best accuracy
turbo: Optimized large-v3 for speed

Use large (alias for large-v3) or turbo for new projects to get the latest improvements.

Get Started

Core Concepts

Guides

Resources

Available Models

Model Specifications

Model Variants

Tiny (39M parameters)

Base (74M parameters)

Small (244M parameters)

Medium (769M parameters)

Large (1550M parameters)

Turbo (809M parameters)

English-only vs Multilingual

When to Use English-only Models

When to Use Multilingual Models

Model Selection Guide

Loading Models

Basic Loading

With Device Selection

Custom Download Location

Available Models

Model Versions

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Resources

​Available Models

​Model Specifications

​Model Variants

​Tiny (39M parameters)

​Base (74M parameters)

​Small (244M parameters)

​Medium (769M parameters)

​Large (1550M parameters)

​Turbo (809M parameters)

​English-only vs Multilingual

​When to Use English-only Models

​When to Use Multilingual Models

​Model Selection Guide

​Loading Models

​Basic Loading

​With Device Selection

​Custom Download Location

​Available Models

​Model Versions

Build docs developers (and LLMs) love

Available Models

Model Specifications

Model Variants

Tiny (39M parameters)

Base (74M parameters)

Small (244M parameters)

Medium (769M parameters)

Large (1550M parameters)

Turbo (809M parameters)

English-only vs Multilingual

When to Use English-only Models

When to Use Multilingual Models

Model Selection Guide

Loading Models

Basic Loading

With Device Selection

Custom Download Location

Available Models

Model Versions