Complete Language List
Whisper supports the following 99 languages:A-D Languages (24)
A-D Languages (24)
- Afrikaans (af)
- Albanian (sq)
- Amharic (am)
- Arabic (ar)
- Armenian (hy)
- Assamese (as)
- Azerbaijani (az)
- Bashkir (ba)
- Basque (eu)
- Belarusian (be)
- Bengali (bn)
- Bosnian (bs)
- Breton (br)
- Bulgarian (bg)
- Cantonese (yue)
- Catalan (ca)
- Chinese (zh)
- Croatian (hr)
- Czech (cs)
- Danish (da)
E-I Languages (18)
E-I Languages (18)
- Dutch (nl)
- English (en)
- Estonian (et)
- Faroese (fo)
- Finnish (fi)
- French (fr)
- Galician (gl)
- Georgian (ka)
- German (de)
- Greek (el)
- Gujarati (gu)
- Haitian Creole (ht)
- Hausa (ha)
- Hawaiian (haw)
- Hebrew (he)
- Hindi (hi)
- Hungarian (hu)
- Icelandic (is)
- Indonesian (id)
- Italian (it)
J-M Languages (16)
J-M Languages (16)
- Japanese (ja)
- Javanese (jw)
- Kannada (kn)
- Kazakh (kk)
- Khmer (km)
- Korean (ko)
- Lao (lo)
- Latin (la)
- Latvian (lv)
- Lingala (ln)
- Lithuanian (lt)
- Luxembourgish (lb)
- Macedonian (mk)
- Malagasy (mg)
- Malay (ms)
- Malayalam (ml)
- Maltese (mt)
- Marathi (mr)
- Mongolian (mn)
- Myanmar (my)
- Maori (mi)
N-S Languages (20)
N-S Languages (20)
- Nepali (ne)
- Norwegian (no)
- Nynorsk (nn)
- Occitan (oc)
- Pashto (ps)
- Persian (fa)
- Polish (pl)
- Portuguese (pt)
- Punjabi (pa)
- Romanian (ro)
- Russian (ru)
- Sanskrit (sa)
- Serbian (sr)
- Shona (sn)
- Sindhi (sd)
- Sinhala (si)
- Slovak (sk)
- Slovenian (sl)
- Somali (so)
- Spanish (es)
- Sundanese (su)
- Swahili (sw)
- Swedish (sv)
T-Z Languages (21)
T-Z Languages (21)
- Tagalog (tl)
- Tajik (tg)
- Tamil (ta)
- Tatar (tt)
- Telugu (te)
- Thai (th)
- Tibetan (bo)
- Turkish (tr)
- Turkmen (tk)
- Ukrainian (uk)
- Urdu (ur)
- Uzbek (uz)
- Vietnamese (vi)
- Welsh (cy)
- Yiddish (yi)
- Yoruba (yo)
Language Aliases
Some languages can be specified using alternative names:| Alias | Language Code |
|---|---|
| Burmese | my (Myanmar) |
| Valencian | ca (Catalan) |
| Flemish | nl (Dutch) |
| Haitian | ht (Haitian Creole) |
| Letzeburgesch | lb (Luxembourgish) |
| Pushto | ps (Pashto) |
| Panjabi | pa (Punjabi) |
| Moldavian / Moldovan | ro (Romanian) |
| Sinhalese | si (Sinhala) |
| Castilian | es (Spanish) |
| Mandarin | zh (Chinese) |
Performance by Language
Whisper’s performance varies widely depending on the language. The figure below shows WER (Word Error Rate) or CER (Character Error Rate, in italic) for large-v3 and large-v2 models evaluated on Common Voice 15 and Fleurs datasets.Lower WER/CER values indicate better performance. Additional metrics for other models can be found in Appendix D of the Whisper paper.
Using Languages
Specifying Language in Transcription
Command-line Usage
Automatic Language Detection
Getting All Language Probabilities
Translation to English
Translating Non-English Speech
Command-line Translation
Language-Specific Considerations
Languages Without Spaces
For languages that don’t use spaces (Chinese, Japanese, Thai, Lao, Myanmar, Cantonese), the tokenizer uses special word splitting:Character-based Metrics
Some languages use Character Error Rate (CER) instead of Word Error Rate (WER) for evaluation:- Chinese (zh)
- Japanese (ja)
- Thai (th)
- And other languages shown in italic in the performance chart
Best Practices
High-Resource Languages
English, Spanish, French, German, Chinese, and other widely-spoken languages generally have the best performance.
Low-Resource Languages
Less common languages may have higher error rates. Consider using larger models for better accuracy.
Language Detection
If unsure of the language, use automatic detection. It’s reliable for most well-represented languages.
Model Selection
Use larger models (medium, large) for better performance on challenging languages or translation tasks.