CLI Usage

Overview

The whisper command-line tool provides a simple interface for transcribing and translating audio files. It supports multiple audio formats and offers extensive customization options.

Basic Usage

Install Whisper

pip install -U openai-whisper

You’ll also need ffmpeg installed on your system:

sudo apt update && sudo apt install ffmpeg

Transcribe an audio file

whisper audio.mp3

By default, this uses the turbo model and outputs all available formats (txt, vtt, srt, tsv, json).

Command Syntax

whisper [audio_files...] [options]

Multiple Files

Process multiple audio files in one command:

whisper audio.flac audio.mp3 audio.wav --model turbo

Model Selection

Choose from different model sizes to balance speed and accuracy:

whisper audio.mp3 --model medium

Available models: tiny, base, small, medium, large, turbo, or English-only variants (tiny.en, base.en, small.en, medium.en).

The default model is turbo, which offers fast transcription with good accuracy for English and multilingual content.

Language Options

Automatic Language Detection

By default, Whisper detects the language automatically:

whisper japanese.wav

Specify Language

For better performance, specify the language explicitly:

whisper japanese.wav --language Japanese

You can use either the language name (e.g., Japanese, Spanish) or language code (e.g., ja, es).

Translation to English

Translate non-English speech directly to English:

whisper japanese.wav --model medium --language Japanese --task translate

The turbo model does not support translation. Use multilingual models (tiny, base, small, medium, large) for translation tasks.

Output Options

Output Directory

Specify where to save the transcription files:

whisper audio.mp3 --output_dir ./transcripts

Output Format

Choose specific output formats:

whisper audio.mp3 --output_format srt

Available formats:

txt - Plain text
vtt - WebVTT subtitles
srt - SubRip subtitles
tsv - Tab-separated values with timestamps
json - JSON with detailed segment information
all - Generate all formats (default)

Advanced Options

Word-Level Timestamps

Extract word-level timestamps for precise timing:

whisper audio.mp3 --word_timestamps True

This enables additional subtitle formatting options:

whisper audio.mp3 --word_timestamps True --max_line_width 50 --highlight_words True

Device Selection

Choose between CPU and GPU processing:

whisper audio.mp3 --device cuda  # Use GPU
whisper audio.mp3 --device cpu   # Use CPU

Initial Prompt

Provide context or custom vocabulary to improve accuracy:

whisper audio.mp3 --initial_prompt "This is a technical discussion about machine learning and neural networks."

Temperature and Sampling

Greedy Decoding
Sampling

Use temperature 0 for deterministic output:

whisper audio.mp3 --temperature 0 --beam_size 5

Use non-zero temperature for sampling:

whisper audio.mp3 --temperature 0.8 --best_of 5

Compression and Quality Thresholds

whisper audio.mp3 \
  --compression_ratio_threshold 2.4 \
  --logprob_threshold -1.0 \
  --no_speech_threshold 0.6

--compression_ratio_threshold: Detect and retry overly repetitive outputs (default: 2.4)
--logprob_threshold: Retry if average log probability is too low (default: -1.0)
--no_speech_threshold: Detect silent segments (default: 0.6)

Common Examples

Transcribe with High Accuracy

whisper interview.mp3 --model large --language English

Generate SRT Subtitles

whisper video.mp4 --model medium --output_format srt --word_timestamps True

Process Specific Audio Clips

whisper podcast.mp3 --clip_timestamps "0,300,600,900" --output_dir ./segments

This processes clips from 0-300s and 600-900s.

Batch Processing with Consistent Settings

whisper *.wav --model turbo --language English --output_dir ./output --output_format json

Full Options Reference

View all available options:

whisper --help

For large files or batch processing, consider using a GPU with --device cuda to significantly speed up transcription.

Get Started

Core Concepts

Guides

Resources

Overview

Basic Usage

Command Syntax

Multiple Files

Model Selection

Language Options

Automatic Language Detection

Specify Language

Translation to English

Output Options

Output Directory

Output Format

Advanced Options

Word-Level Timestamps

Device Selection

Initial Prompt

Temperature and Sampling

Compression and Quality Thresholds

Common Examples

Transcribe with High Accuracy

Generate SRT Subtitles

Process Specific Audio Clips

Batch Processing with Consistent Settings

Full Options Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Resources

​Overview

​Basic Usage

​Command Syntax

​Multiple Files

​Model Selection

​Language Options

​Automatic Language Detection

​Specify Language

​Translation to English

​Output Options

​Output Directory

​Output Format

​Advanced Options

​Word-Level Timestamps

​Device Selection

​Initial Prompt

​Temperature and Sampling

​Compression and Quality Thresholds

​Common Examples

​Transcribe with High Accuracy

​Generate SRT Subtitles

​Process Specific Audio Clips

​Batch Processing with Consistent Settings

​Full Options Reference

Build docs developers (and LLMs) love

Overview

Basic Usage

Command Syntax

Multiple Files

Model Selection

Language Options

Automatic Language Detection

Specify Language

Translation to English

Output Options

Output Directory

Output Format

Advanced Options

Word-Level Timestamps

Device Selection

Initial Prompt

Temperature and Sampling

Compression and Quality Thresholds

Common Examples

Transcribe with High Accuracy

Generate SRT Subtitles

Process Specific Audio Clips

Batch Processing with Consistent Settings

Full Options Reference