Introduction

What is Whisper?

Whisper is a general-purpose speech recognition model developed by OpenAI. It is trained on a large dataset of diverse audio and is a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline.

Key Features

Multilingual Recognition

Transcribe speech in 99+ languages with high accuracy across diverse accents and dialects

Speech Translation

Translate speech from any supported language directly into English text

Multiple Model Sizes

Choose from 6 model sizes (tiny to turbo) to balance speed and accuracy for your use case

Simple API

Easy-to-use Python API and command-line interface for quick integration

Available Models

Whisper offers six model sizes with varying speed and accuracy tradeoffs:

Size	Parameters	English-only	Multilingual	Required VRAM	Relative Speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~10x
base	74 M	`base.en`	`base`	~1 GB	~7x
small	244 M	`small.en`	`small`	~2 GB	~4x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x
turbo	809 M	N/A	`turbo`	~6 GB	~8x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. The turbo model is an optimized version of large-v3 that offers faster transcription speed with minimal degradation in accuracy.

The turbo model is not trained for translation tasks. If you need to translate non-English speech into English, use one of the multilingual models (tiny, base, small, medium, large) instead.

Getting Started

Installation

Install Whisper and its dependencies on your system

Quickstart

Start transcribing audio in minutes with CLI and Python examples

Resources

Research Paper

Read the technical paper on arXiv

Blog Post

Learn more from the official OpenAI blog

Model Card

View detailed model specifications

Python Compatibility

Whisper is compatible with Python 3.8-3.13 and recent PyTorch versions. The codebase was developed and tested with Python 3.9.9 and PyTorch 1.10.1.

Get Started

Core Concepts

Guides

Resources

What is Whisper?

Key Features

Multilingual Recognition

Speech Translation

Multiple Model Sizes

Simple API

Available Models

Getting Started

Installation

Quickstart

Resources

Research Paper

Blog Post

Model Card

Python Compatibility

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Resources

​What is Whisper?

​Key Features

Multilingual Recognition

Speech Translation

Multiple Model Sizes

Simple API

​Available Models

​Getting Started

Installation

Quickstart

​Resources

Research Paper

Blog Post

Model Card

​Python Compatibility

Build docs developers (and LLMs) love

What is Whisper?

Key Features

Available Models

Getting Started

Resources

Python Compatibility