Overview
LangShazam uses OpenAI’s Whisper model (whisper-1) for language detection, which supports 99 languages across various language families. The system can accurately identify spoken language in real-time through audio streaming.
Language detection works best with audio clips between 4-15 seconds. Longer or clearer audio generally results in more accurate detection.
Supported Languages
Whisper supports the following languages:A-E
A-E
- Afrikaans (af)
- Albanian (sq)
- Amharic (am)
- Arabic (ar)
- Armenian (hy)
- Assamese (as)
- Azerbaijani (az)
- Bashkir (ba)
- Basque (eu)
- Belarusian (be)
- Bengali (bn)
- Bosnian (bs)
- Breton (br)
- Bulgarian (bg)
- Burmese (my)
- Catalan (ca)
- Chinese (zh)
- Croatian (hr)
- Czech (cs)
- Danish (da)
- Dutch (nl)
- English (en)
- Estonian (et)
F-L
F-L
- Faroese (fo)
- Finnish (fi)
- French (fr)
- Galician (gl)
- Georgian (ka)
- German (de)
- Greek (el)
- Gujarati (gu)
- Haitian Creole (ht)
- Hausa (ha)
- Hawaiian (haw)
- Hebrew (he)
- Hindi (hi)
- Hungarian (hu)
- Icelandic (is)
- Indonesian (id)
- Italian (it)
- Japanese (ja)
- Javanese (jv)
- Kannada (kn)
- Kazakh (kk)
- Khmer (km)
- Korean (ko)
- Lao (lo)
- Latin (la)
- Latvian (lv)
- Lingala (ln)
- Lithuanian (lt)
- Luxembourgish (lb)
M-S
M-S
- Macedonian (mk)
- Malagasy (mg)
- Malay (ms)
- Malayalam (ml)
- Maltese (mt)
- Maori (mi)
- Marathi (mr)
- Mongolian (mn)
- Nepali (ne)
- Norwegian (no)
- Nynorsk (nn)
- Occitan (oc)
- Pashto (ps)
- Persian (fa)
- Polish (pl)
- Portuguese (pt)
- Punjabi (pa)
- Romanian (ro)
- Russian (ru)
- Sanskrit (sa)
- Serbian (sr)
- Shona (sn)
- Sindhi (sd)
- Sinhala (si)
- Slovak (sk)
- Slovenian (sl)
- Somali (so)
- Spanish (es)
- Sundanese (su)
- Swahili (sw)
- Swedish (sv)
T-Z
T-Z
- Tagalog (tl)
- Tajik (tg)
- Tamil (ta)
- Tatar (tt)
- Telugu (te)
- Thai (th)
- Tibetan (bo)
- Turkish (tr)
- Turkmen (tk)
- Ukrainian (uk)
- Urdu (ur)
- Uzbek (uz)
- Vietnamese (vi)
- Welsh (cy)
- Yiddish (yi)
- Yoruba (yo)
Language Detection Response
When audio is processed, LangShazam returns a response in the following format:Response Fields
The ISO 639-1 language code of the detected language (e.g., “en” for English, “es” for Spanish)
Confidence score for the detection (currently fixed at 0.9 as defined in
backend/src/audio_processor.py:62)Time taken to process the audio in seconds, including API call latency
Unique identifier for tracking the request through the system
Detection Accuracy
Factors Affecting Accuracy
Language detection accuracy depends on several factors:Audio Quality
Clear audio with minimal background noise produces better results
Audio Length
4-15 seconds is optimal. Shorter clips may be less accurate
Speaker Clarity
Clear pronunciation and natural speaking pace improve detection
Language Mixing
Single-language speech works best. Code-switching may confuse the model
Best Practices
Technical Limitations
Audio Constraints
- Minimum audio length: 4 seconds (4000ms)
- Maximum audio length: 15 seconds (15000ms)
- Minimum audio size: 20KB
- Audio format: MP4 format sent to API
- Sample rate: 16000 bits per second
backend/src/config/settings.py:26 and help ensure optimal detection quality.
Concurrent Processing
The backend limits concurrent OpenAI API calls to 3 simultaneous requests to manage rate limits and costs. Additional requests are queued automatically (seebackend/src/audio_processor.py:17).
Dialect Detection
Whisper detects the primary language but may not distinguish between regional dialects or accents. For example:- American English vs. British English = both detected as “en”
- European Spanish vs. Latin American Spanish = both detected as “es”
- Mandarin vs. Cantonese = both may be detected as “zh”
Error Handling
If language detection fails, you’ll receive an error response:- Audio too short (< 4 seconds)
- Audio too long (> 15 seconds)
- OpenAI API rate limits
- Network connectivity issues
- Invalid audio format
Need More Information?
Quickstart Guide
Get started with LangShazam
API Reference
Detailed API documentation

