Chatterbox Multilingual enables you to generate natural speech in 23+ languages with zero-shot voice cloning. The model supports cross-lingual synthesis, allowing you to use a voice from one language and generate speech in another.
Supported Languages
Chatterbox Multilingual supports the following 23 languages:
| Language | Code | Language | Code |
|---|
| Arabic | ar | Korean | ko |
| Chinese | zh | Malay | ms |
| Danish | da | Dutch | nl |
| English | en | Norwegian | no |
| Finnish | fi | Polish | pl |
| French | fr | Portuguese | pt |
| German | de | Russian | ru |
| Greek | el | Spanish | es |
| Hebrew | he | Swedish | sv |
| Hindi | hi | Swahili | sw |
| Italian | it | Turkish | tr |
| Japanese | ja | | |
You can programmatically access the supported languages using:from chatterbox.mtl_tts import ChatterboxMultilingualTTS
languages = ChatterboxMultilingualTTS.get_supported_languages()
print(languages) # {'ar': 'Arabic', 'da': 'Danish', ...}
Using the language_id Parameter
The language_id parameter specifies which language to generate. It’s required for the multilingual model.
Basic Usage
import torchaudio as ta
from chatterbox.mtl_tts import ChatterboxMultilingualTTS
# Initialize the multilingual model
model = ChatterboxMultilingualTTS.from_pretrained(device="cuda")
# Generate French speech
french_text = "Bonjour, comment ça va? Ceci est le modèle de synthèse vocale multilingue Chatterbox."
wav = model.generate(french_text, language_id="fr")
ta.save("french_output.wav", wav, model.sr)
# Generate Chinese speech
chinese_text = "你好,今天天气真不错,希望你有一个愉快的周末。"
wav = model.generate(chinese_text, language_id="zh")
ta.save("chinese_output.wav", wav, model.sr)
Use the two-letter ISO language codes (case-insensitive):
# Both work - the model converts to lowercase internally
wav = model.generate(text, language_id="fr") # Lowercase
wav = model.generate(text, language_id="FR") # Uppercase (converted internally)
Validation
The model validates the language_id and raises an error for unsupported languages:
try:
wav = model.generate("Hello", language_id="xx") # Invalid code
except ValueError as e:
print(e) # "Unsupported language_id 'xx'. Supported languages: ar, da, de..."
Examples in Multiple Languages
European Languages
from chatterbox.mtl_tts import ChatterboxMultilingualTTS
import torchaudio as ta
model = ChatterboxMultilingualTTS.from_pretrained(device="cuda")
# Spanish
spanish = "Hola, ¿cómo estás? Este es un ejemplo en español."
wav = model.generate(spanish, language_id="es")
ta.save("spanish.wav", wav, model.sr)
# German
german = "Guten Tag, wie geht es Ihnen heute?"
wav = model.generate(german, language_id="de")
ta.save("german.wav", wav, model.sr)
# Italian
italian = "Buongiorno, come stai oggi?"
wav = model.generate(italian, language_id="it")
ta.save("italian.wav", wav, model.sr)
# French (from example_tts.py)
french = "Bonjour, comment ça va? Ceci est le modèle de synthèse vocale multilingue Chatterbox, il prend en charge 23 langues."
wav = model.generate(french, language_id="fr")
ta.save("french.wav", wav, model.sr)
Asian Languages
# Chinese (from example_tts.py)
chinese = "你好,今天天气真不错,希望你有一个愉快的周末。"
wav = model.generate(chinese, language_id="zh")
ta.save("chinese.wav", wav, model.sr)
# Japanese
japanese = "こんにちは、お元気ですか?"
wav = model.generate(japanese, language_id="ja")
ta.save("japanese.wav", wav, model.sr)
# Korean
korean = "안녕하세요, 오늘 날씨가 참 좋네요."
wav = model.generate(korean, language_id="ko")
ta.save("korean.wav", wav, model.sr)
# Hindi
hindi = "नमस्ते, आप कैसे हैं?"
wav = model.generate(hindi, language_id="hi")
ta.save("hindi.wav", wav, model.sr)
Other Languages
# Arabic
arabic = "مرحبا، كيف حالك اليوم؟"
wav = model.generate(arabic, language_id="ar")
ta.save("arabic.wav", wav, model.sr)
# Hebrew
hebrew = "שלום, מה שלומך?"
wav = model.generate(hebrew, language_id="he")
ta.save("hebrew.wav", wav, model.sr)
# Russian
russian = "Здравствуйте, как дела?"
wav = model.generate(russian, language_id="ru")
ta.save("russian.wav", wav, model.sr)
# Turkish
turkish = "Merhaba, nasılsınız?"
wav = model.generate(turkish, language_id="tr")
ta.save("turkish.wav", wav, model.sr)
Voice Cloning Across Languages
You can clone a voice from one language and use it to speak another language:
# Use an English voice reference to speak French
wav = model.generate(
"Bonjour, comment ça va?",
language_id="fr",
audio_prompt_path="english_speaker.wav"
)
Accent Transfer: When using a reference voice from one language to generate speech in another, the output may inherit the accent of the reference language. See the tips below for managing this.
Accent and Language Transfer
Understanding Accent Transfer
When you use a voice from Language A to generate speech in Language B, the model may produce speech with an accent characteristic of Language A:
# English speaker generating Spanish - may have English accent
wav = model.generate(
"Hola, ¿cómo estás?",
language_id="es",
audio_prompt_path="native_english_speaker.wav"
)
Reducing Accent Transfer
To minimize accent transfer and get more native-sounding pronunciation, set cfg_weight=0:
# Reduce accent transfer from reference audio
wav = model.generate(
"Hola, ¿cómo estás?",
language_id="es",
audio_prompt_path="english_speaker.wav",
cfg_weight=0.0 # Reduces accent influence
)
From the README: “Ensure that the reference clip matches the specified language tag. Otherwise, language transfer outputs may inherit the accent of the reference clip’s language. To mitigate this, set cfg_weight to 0.”
When to Use Accent Transfer
Sometimes accent transfer is desirable:
- Creating characters with foreign accents
- Representing non-native speakers authentically
- Maintaining voice consistency across languages for a specific character
# Intentionally keep the accent for character consistency
wav = model.generate(
"Buenos días",
language_id="es",
audio_prompt_path="character_voice.wav",
cfg_weight=0.5 # Maintain accent
)
Configuration Tips from README
The README provides specific guidance for multilingual usage:
General Use
Default Settings: The default settings (exaggeration=0.5, cfg_weight=0.5) work well for most prompts across all languages.
# Recommended defaults
wav = model.generate(
text,
language_id="fr",
exaggeration=0.5, # Default
cfg_weight=0.5 # Default
)
Fast Speaking Reference
If your reference speaker has a fast speaking style, lower cfg_weight:
# For fast-speaking reference audio
wav = model.generate(
text,
language_id="es",
audio_prompt_path="fast_speaker.wav",
cfg_weight=0.3 # Improves pacing
)
Matching Reference Language
For best results, match the reference audio language to your target language:
# Best: French reference for French synthesis
wav = model.generate(
"Bonjour!",
language_id="fr",
audio_prompt_path="french_speaker.wav"
)
# Acceptable: English reference for French (may have accent)
wav = model.generate(
"Bonjour!",
language_id="fr",
audio_prompt_path="english_speaker.wav",
cfg_weight=0.0 # Reduce accent
)
Complete Multilingual Example
import torchaudio as ta
import torch
from chatterbox.mtl_tts import ChatterboxMultilingualTTS
# Auto-detect device
if torch.cuda.is_available():
device = "cuda"
elif torch.backends.mps.is_available():
device = "mps"
else:
device = "cpu"
print(f"Using device: {device}")
# Initialize model
model = ChatterboxMultilingualTTS.from_pretrained(device=device)
# Generate in multiple languages
languages = [
("Hello, how are you today?", "en", "english.wav"),
("Bonjour, comment ça va?", "fr", "french.wav"),
("Hola, ¿cómo estás?", "es", "spanish.wav"),
("你好,今天天气真不错。", "zh", "chinese.wav"),
]
for text, lang_id, output_file in languages:
print(f"Generating {lang_id}...")
wav = model.generate(text, language_id=lang_id)
ta.save(output_file, wav, model.sr)
print(f"Saved {output_file}")
Model Selection
Choose the right model for your needs:
| Use Case | Recommended Model |
|---|
| English only | Chatterbox or Chatterbox Turbo |
| Multiple languages | Chatterbox Multilingual |
| English with paralinguistic tags | Chatterbox Turbo |
| Global applications | Chatterbox Multilingual |
| Low-latency voice agents (English) | Chatterbox Turbo |
Punctuation Handling
The multilingual model handles language-specific punctuation:
# Chinese punctuation (from mtl_tts.py:86)
sentence_enders = {".", "!", "?", "-", ",", "、", ",", "。", "?", "!"}
The model automatically normalizes punctuation for each language.
Advanced Configuration
Full Parameter Example
wav = model.generate(
text="Votre texte ici",
language_id="fr",
audio_prompt_path="reference.wav",
exaggeration=0.5, # Expressiveness
cfg_weight=0.5, # Conditioning strength
temperature=0.8, # Sampling randomness
repetition_penalty=2.0, # Reduce repetition
min_p=0.05, # Min probability threshold
top_p=1.0 # Nucleus sampling
)
See the Configuration guide for detailed parameter explanations.
Troubleshooting
Unsupported Language Error
# Error: ValueError: Unsupported language_id 'xx'
languages = ChatterboxMultilingualTTS.get_supported_languages()
print(f"Supported: {list(languages.keys())}")
Poor Pronunciation
- Ensure correct language_id - Verify you’re using the right code
- Check text encoding - Use UTF-8 for non-Latin scripts
- Adjust cfg_weight - Try
cfg_weight=0.0 for more native pronunciation
- Use native reference audio - Match reference language to target language
Strong Accent
If the accent from reference audio is too strong:
# Minimize accent influence
wav = model.generate(
text,
language_id="target_language",
audio_prompt_path="reference.wav",
cfg_weight=0.0 # Removes most accent influence
)