Supported Languages

Overview

LangShazam uses OpenAI’s Whisper model (whisper-1) for language detection, which supports 99 languages across various language families. The system can accurately identify spoken language in real-time through audio streaming.

Language detection works best with audio clips between 4-15 seconds. Longer or clearer audio generally results in more accurate detection.

Whisper supports the following languages:

A-E

Afrikaans (af)
Albanian (sq)
Amharic (am)
Arabic (ar)
Armenian (hy)
Assamese (as)
Azerbaijani (az)
Bashkir (ba)
Basque (eu)
Belarusian (be)
Bengali (bn)
Bosnian (bs)
Breton (br)
Bulgarian (bg)
Burmese (my)
Catalan (ca)
Chinese (zh)
Croatian (hr)
Czech (cs)
Danish (da)
Dutch (nl)
English (en)
Estonian (et)

F-L

Faroese (fo)
Finnish (fi)
French (fr)
Galician (gl)
Georgian (ka)
German (de)
Greek (el)
Gujarati (gu)
Haitian Creole (ht)
Hausa (ha)
Hawaiian (haw)
Hebrew (he)
Hindi (hi)
Hungarian (hu)
Icelandic (is)
Indonesian (id)
Italian (it)
Japanese (ja)
Javanese (jv)
Kannada (kn)
Kazakh (kk)
Khmer (km)
Korean (ko)
Lao (lo)
Latin (la)
Latvian (lv)
Lingala (ln)
Lithuanian (lt)
Luxembourgish (lb)

M-S

Macedonian (mk)
Malagasy (mg)
Malay (ms)
Malayalam (ml)
Maltese (mt)
Maori (mi)
Marathi (mr)
Mongolian (mn)
Nepali (ne)
Norwegian (no)
Nynorsk (nn)
Occitan (oc)
Pashto (ps)
Persian (fa)
Polish (pl)
Portuguese (pt)
Punjabi (pa)
Romanian (ro)
Russian (ru)
Sanskrit (sa)
Serbian (sr)
Shona (sn)
Sindhi (sd)
Sinhala (si)
Slovak (sk)
Slovenian (sl)
Somali (so)
Spanish (es)
Sundanese (su)
Swahili (sw)
Swedish (sv)

T-Z

Tagalog (tl)
Tajik (tg)
Tamil (ta)
Tatar (tt)
Telugu (te)
Thai (th)
Tibetan (bo)
Turkish (tr)
Turkmen (tk)
Ukrainian (uk)
Urdu (ur)
Uzbek (uz)
Vietnamese (vi)
Welsh (cy)
Yiddish (yi)
Yoruba (yo)

Language Detection Response

When audio is processed, LangShazam returns a response in the following format:

{
  "language": "en",
  "confidence": 0.9,
  "processing_time": 1.23,
  "connection_id": "conn_abc123"
}

Response Fields

language

string

required

The ISO 639-1 language code of the detected language (e.g., “en” for English, “es” for Spanish)

confidence

number

required

Confidence score for the detection (currently fixed at 0.9 as defined in backend/src/audio_processor.py:62)

processing_time

number

required

Time taken to process the audio in seconds, including API call latency

connection_id

string

required

Unique identifier for tracking the request through the system

Detection Accuracy

Factors Affecting Accuracy

Language detection accuracy depends on several factors:

Audio Quality

Clear audio with minimal background noise produces better results

Audio Length

4-15 seconds is optimal. Shorter clips may be less accurate

Speaker Clarity

Clear pronunciation and natural speaking pace improve detection

Language Mixing

Single-language speech works best. Code-switching may confuse the model

Best Practices

Record in a quiet environment

Minimize background noise for clearer audio capture

Speak naturally

Use normal speaking pace and volume

Provide sufficient context

Speak for at least 4 seconds to give the model enough data

Use one language at a time

Avoid mixing languages within a single recording

Technical Limitations

Be aware of these current limitations:

Audio Constraints

Minimum audio length: 4 seconds (4000ms)
Maximum audio length: 15 seconds (15000ms)
Minimum audio size: 20KB
Audio format: MP4 format sent to API
Sample rate: 16000 bits per second

These constraints are defined in backend/src/config/settings.py:26 and help ensure optimal detection quality.

Concurrent Processing

The backend limits concurrent OpenAI API calls to 3 simultaneous requests to manage rate limits and costs. Additional requests are queued automatically (see backend/src/audio_processor.py:17).

Dialect Detection

Whisper detects the primary language but may not distinguish between regional dialects or accents. For example:

American English vs. British English = both detected as “en”
European Spanish vs. Latin American Spanish = both detected as “es”
Mandarin vs. Cantonese = both may be detected as “zh”

Error Handling

If language detection fails, you’ll receive an error response:

{
  "error": "Error description",
  "connection_id": "conn_abc123"
}

Common error scenarios:

Audio too short (< 4 seconds)
Audio too long (> 15 seconds)
OpenAI API rate limits
Network connectivity issues
Invalid audio format

Get Started

Core Features

Architecture

Supported Languages

Overview