Overview
Performs speech recognition using Faster Whisper, an optimized implementation of OpenAI’s Whisper model using CTranslate2. Provides 4x faster inference than the standard Whisper implementation with similar accuracy.This method works completely offline - no internet connection required after model download.
Method Signature
Parameters
The audio data to recognize. Must be an
AudioData instance.Whisper model size to use. Available models:
"tiny"- Smallest, fastest (39M parameters)"base"- Good balance (74M parameters)"small"- Better accuracy (244M parameters)"medium"- High accuracy (769M parameters)"large"or"large-v3"- Best accuracy (1550M parameters)"turbo"- Optimized large model
If
True, returns a dictionary with full transcription details including detected language and segments. If False, returns only the transcription text.Options for model initialization:
device:"cpu","cuda", or"auto"(default: auto)compute_type:"int8","float16","float32"(default: auto-selected)download_root: Directory to cache models (default:~/.cache/huggingface/hub)
Language code (e.g.,
"en", "es", "fr"). If not specified, the language is automatically detected.Task to perform:
"transcribe"- Transcribe audio in its original language"translate"- Transcribe and translate to English
Beam size for beam search decoding. Higher values may improve accuracy but increase computation time.
Additional options passed to Faster Whisper’s transcribe method. See Faster Whisper documentation for all available options.
Return Value
Simple Mode (show_dict=False)
The transcribed text from the audio.
Dictionary Mode (show_dict=True)
Dictionary containing:
text(str): The transcribed textsegments(list): List of segment objects with timestamps and textlanguage(str): Detected or specified language code
Exceptions
Raised if the speech is unintelligible or transcription fails.
Raised if there’s an error loading the model or processing audio.
Setup
Installation
Install Faster Whisper:Model Download
Models are automatically downloaded on first use and cached locally. The first run will download the selected model (can take a few minutes depending on model size and internet speed).Examples
Basic Usage
With Language Specification
From Audio File
With Full Response Details
Translation to English
GPU Acceleration
Custom Model Cache Location
Performance Comparison
Faster Whisper provides significant performance improvements over standard Whisper:| Model | Standard Whisper | Faster Whisper | Speedup |
|---|---|---|---|
| tiny | 32s | 6s | 5.3x |
| base | 46s | 10s | 4.6x |
| small | 83s | 18s | 4.6x |
| medium | 152s | 32s | 4.8x |
| large | 251s | 55s | 4.6x |
Model Selection Guide
- tiny - Testing, prototyping (lowest accuracy)
- base - General use, real-time applications
- small - Good balance of speed and accuracy
- medium - High accuracy needed, acceptable latency
- large/large-v3 - Maximum accuracy, offline processing
- turbo - Best of both worlds (accuracy + speed)
Language Support
Supports 99 languages including:- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Italian (it)
- Portuguese (pt)
- Dutch (nl)
- Russian (ru)
- Chinese (zh)
- Japanese (ja)
- Korean (ko)
- Arabic (ar)
- Hindi (hi)
- And 86 more…
Compute Types
Different compute types offer trade-offs between speed, memory, and accuracy:- int8 - 4x smaller, 2x faster, slight accuracy loss
- float16 - 2x smaller, 2x faster (GPU only), minimal accuracy loss
- float32 - Full precision, slower, maximum accuracy
Best Practices
Related Methods
recognize_whisper()- Standard Whisper (slower but simpler)recognize_openai()- Cloud-based OpenAI Whisper APIrecognize_groq()- Ultra-fast cloud Whisper via Groq