Skip to main content
Whisper Approach

What is Whisper?

Whisper is a general-purpose speech recognition model developed by OpenAI. It is trained on a large dataset of diverse audio and is a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline.

Key Features

Multilingual Recognition

Transcribe speech in 99+ languages with high accuracy across diverse accents and dialects

Speech Translation

Translate speech from any supported language directly into English text

Multiple Model Sizes

Choose from 6 model sizes (tiny to turbo) to balance speed and accuracy for your use case

Simple API

Easy-to-use Python API and command-line interface for quick integration

Available Models

Whisper offers six model sizes with varying speed and accuracy tradeoffs:
SizeParametersEnglish-onlyMultilingualRequired VRAMRelative Speed
tiny39 Mtiny.entiny~1 GB~10x
base74 Mbase.enbase~1 GB~7x
small244 Msmall.ensmall~2 GB~4x
medium769 Mmedium.enmedium~5 GB~2x
large1550 MN/Alarge~10 GB1x
turbo809 MN/Aturbo~6 GB~8x
The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. The turbo model is an optimized version of large-v3 that offers faster transcription speed with minimal degradation in accuracy.
The turbo model is not trained for translation tasks. If you need to translate non-English speech into English, use one of the multilingual models (tiny, base, small, medium, large) instead.

Getting Started

Installation

Install Whisper and its dependencies on your system

Quickstart

Start transcribing audio in minutes with CLI and Python examples

Resources

Research Paper

Read the technical paper on arXiv

Blog Post

Learn more from the official OpenAI blog

Model Card

View detailed model specifications

Python Compatibility

Whisper is compatible with Python 3.8-3.13 and recent PyTorch versions. The codebase was developed and tested with Python 3.9.9 and PyTorch 1.10.1.

Build docs developers (and LLMs) love