Skip to main content

Overview

The SUPPORTED_LANGUAGES dictionary contains all language codes supported by the ChatterboxMultilingualTTS model. Use these language codes with the language_id parameter when generating speech.

Language Codes

ChatterboxMultilingualTTS supports 23 languages:
Language CodeLanguage Name
arArabic
daDanish
deGerman
elGreek
enEnglish
esSpanish
fiFinnish
frFrench
heHebrew
hiHindi
itItalian
jaJapanese
koKorean
msMalay
nlDutch
noNorwegian
plPolish
ptPortuguese
ruRussian
svSwedish
swSwahili
trTurkish
zhChinese

Usage

Import and use the SUPPORTED_LANGUAGES dictionary in your code:
from chatterbox import ChatterboxMultilingualTTS, SUPPORTED_LANGUAGES

# Print all supported languages
for code, name in SUPPORTED_LANGUAGES.items():
    print(f"{code}: {name}")

# Check if a language is supported
if "fr" in SUPPORTED_LANGUAGES:
    print(f"French is supported: {SUPPORTED_LANGUAGES['fr']}")

# Get supported languages from the class method
languages = ChatterboxMultilingualTTS.get_supported_languages()
print(languages)
# {'ar': 'Arabic', 'da': 'Danish', ...}

Using Language Codes

Pass the language code to the language_id parameter when generating speech:
import torchaudio
from chatterbox import ChatterboxMultilingualTTS

device = "cuda"
model = ChatterboxMultilingualTTS.from_pretrained(device)

# Generate speech in different languages
languages_to_test = {
    "en": "Hello, how are you today?",
    "es": "Hola, ¿cómo estás hoy?",
    "fr": "Bonjour, comment allez-vous aujourd'hui?",
    "de": "Hallo, wie geht es dir heute?",
    "ja": "こんにちは、今日はお元気ですか?",
    "zh": "你好,你今天好吗?",
}

for lang_code, text in languages_to_test.items():
    audio = model.generate(
        text=text,
        language_id=lang_code,
        audio_prompt_path="voice_sample.wav"
    )
    torchaudio.save(f"output_{lang_code}.wav", audio, model.sr)

Cross-Lingual Voice Cloning

You can clone a voice from one language and use it to synthesize speech in any other supported language:
from chatterbox import ChatterboxMultilingualTTS
import torchaudio

device = "cuda"
model = ChatterboxMultilingualTTS.from_pretrained(device)

# Clone an English voice
model.prepare_conditionals("english_speaker.wav")

# Use that voice to speak multiple languages
for lang_code in ["en", "fr", "es", "de", "ja"]:
    if lang_code == "en":
        text = "This is an English voice."
    elif lang_code == "fr":
        text = "C'est une voix anglaise qui parle français."
    elif lang_code == "es":
        text = "Esta es una voz inglesa hablando español."
    elif lang_code == "de":
        text = "Das ist eine englische Stimme, die Deutsch spricht."
    elif lang_code == "ja":
        text = "これは日本語を話す英語の声です。"
    
    audio = model.generate(text=text, language_id=lang_code)
    torchaudio.save(f"cross_lingual_{lang_code}.wav", audio, model.sr)

Validation

The model automatically validates language codes. If you provide an invalid language code, it will raise a ValueError:
try:
    audio = model.generate(
        text="Hello world",
        language_id="invalid_code"
    )
except ValueError as e:
    print(e)
    # ValueError: Unsupported language_id 'invalid_code'. 
    # Supported languages: ar, da, de, el, en, es, fi, fr, he, hi, it, ja, ko, ms, nl, no, pl, pt, ru, sv, sw, tr, zh

Notes

  • Language codes are case-insensitive (“EN” and “en” both work)
  • The model performs automatic text normalization for each language, including language-specific punctuation
  • Cross-lingual voice cloning works best when the reference audio is clear and at least 5-10 seconds long
  • Some languages may require specific fonts or Unicode support for proper text display

Build docs developers (and LLMs) love