Skip to main content
Chatterbox Multilingual enables you to generate natural speech in 23+ languages with zero-shot voice cloning. The model supports cross-lingual synthesis, allowing you to use a voice from one language and generate speech in another.

Supported Languages

Chatterbox Multilingual supports the following 23 languages:
LanguageCodeLanguageCode
ArabicarKoreanko
ChinesezhMalayms
DanishdaDutchnl
EnglishenNorwegianno
FinnishfiPolishpl
FrenchfrPortuguesept
GermandeRussianru
GreekelSpanishes
HebrewheSwedishsv
HindihiSwahilisw
ItalianitTurkishtr
Japaneseja
You can programmatically access the supported languages using:
from chatterbox.mtl_tts import ChatterboxMultilingualTTS
languages = ChatterboxMultilingualTTS.get_supported_languages()
print(languages)  # {'ar': 'Arabic', 'da': 'Danish', ...}

Using the language_id Parameter

The language_id parameter specifies which language to generate. It’s required for the multilingual model.

Basic Usage

import torchaudio as ta
from chatterbox.mtl_tts import ChatterboxMultilingualTTS

# Initialize the multilingual model
model = ChatterboxMultilingualTTS.from_pretrained(device="cuda")

# Generate French speech
french_text = "Bonjour, comment ça va? Ceci est le modèle de synthèse vocale multilingue Chatterbox."
wav = model.generate(french_text, language_id="fr")
ta.save("french_output.wav", wav, model.sr)

# Generate Chinese speech
chinese_text = "你好,今天天气真不错,希望你有一个愉快的周末。"
wav = model.generate(chinese_text, language_id="zh")
ta.save("chinese_output.wav", wav, model.sr)

Language Code Format

Use the two-letter ISO language codes (case-insensitive):
# Both work - the model converts to lowercase internally
wav = model.generate(text, language_id="fr")  # Lowercase
wav = model.generate(text, language_id="FR")  # Uppercase (converted internally)

Validation

The model validates the language_id and raises an error for unsupported languages:
try:
    wav = model.generate("Hello", language_id="xx")  # Invalid code
except ValueError as e:
    print(e)  # "Unsupported language_id 'xx'. Supported languages: ar, da, de..."

Examples in Multiple Languages

European Languages

from chatterbox.mtl_tts import ChatterboxMultilingualTTS
import torchaudio as ta

model = ChatterboxMultilingualTTS.from_pretrained(device="cuda")

# Spanish
spanish = "Hola, ¿cómo estás? Este es un ejemplo en español."
wav = model.generate(spanish, language_id="es")
ta.save("spanish.wav", wav, model.sr)

# German
german = "Guten Tag, wie geht es Ihnen heute?"
wav = model.generate(german, language_id="de")
ta.save("german.wav", wav, model.sr)

# Italian
italian = "Buongiorno, come stai oggi?"
wav = model.generate(italian, language_id="it")
ta.save("italian.wav", wav, model.sr)

# French (from example_tts.py)
french = "Bonjour, comment ça va? Ceci est le modèle de synthèse vocale multilingue Chatterbox, il prend en charge 23 langues."
wav = model.generate(french, language_id="fr")
ta.save("french.wav", wav, model.sr)

Asian Languages

# Chinese (from example_tts.py)
chinese = "你好,今天天气真不错,希望你有一个愉快的周末。"
wav = model.generate(chinese, language_id="zh")
ta.save("chinese.wav", wav, model.sr)

# Japanese
japanese = "こんにちは、お元気ですか?"
wav = model.generate(japanese, language_id="ja")
ta.save("japanese.wav", wav, model.sr)

# Korean
korean = "안녕하세요, 오늘 날씨가 참 좋네요."
wav = model.generate(korean, language_id="ko")
ta.save("korean.wav", wav, model.sr)

# Hindi
hindi = "नमस्ते, आप कैसे हैं?"
wav = model.generate(hindi, language_id="hi")
ta.save("hindi.wav", wav, model.sr)

Other Languages

# Arabic
arabic = "مرحبا، كيف حالك اليوم؟"
wav = model.generate(arabic, language_id="ar")
ta.save("arabic.wav", wav, model.sr)

# Hebrew
hebrew = "שלום, מה שלומך?"
wav = model.generate(hebrew, language_id="he")
ta.save("hebrew.wav", wav, model.sr)

# Russian
russian = "Здравствуйте, как дела?"
wav = model.generate(russian, language_id="ru")
ta.save("russian.wav", wav, model.sr)

# Turkish
turkish = "Merhaba, nasılsınız?"
wav = model.generate(turkish, language_id="tr")
ta.save("turkish.wav", wav, model.sr)

Voice Cloning Across Languages

You can clone a voice from one language and use it to speak another language:
# Use an English voice reference to speak French
wav = model.generate(
    "Bonjour, comment ça va?",
    language_id="fr",
    audio_prompt_path="english_speaker.wav"
)
Accent Transfer: When using a reference voice from one language to generate speech in another, the output may inherit the accent of the reference language. See the tips below for managing this.

Accent and Language Transfer

Understanding Accent Transfer

When you use a voice from Language A to generate speech in Language B, the model may produce speech with an accent characteristic of Language A:
# English speaker generating Spanish - may have English accent
wav = model.generate(
    "Hola, ¿cómo estás?",
    language_id="es",
    audio_prompt_path="native_english_speaker.wav"
)

Reducing Accent Transfer

To minimize accent transfer and get more native-sounding pronunciation, set cfg_weight=0:
# Reduce accent transfer from reference audio
wav = model.generate(
    "Hola, ¿cómo estás?",
    language_id="es",
    audio_prompt_path="english_speaker.wav",
    cfg_weight=0.0  # Reduces accent influence
)
From the README: “Ensure that the reference clip matches the specified language tag. Otherwise, language transfer outputs may inherit the accent of the reference clip’s language. To mitigate this, set cfg_weight to 0.”

When to Use Accent Transfer

Sometimes accent transfer is desirable:
  • Creating characters with foreign accents
  • Representing non-native speakers authentically
  • Maintaining voice consistency across languages for a specific character
# Intentionally keep the accent for character consistency
wav = model.generate(
    "Buenos días",
    language_id="es",
    audio_prompt_path="character_voice.wav",
    cfg_weight=0.5  # Maintain accent
)

Configuration Tips from README

The README provides specific guidance for multilingual usage:

General Use

Default Settings: The default settings (exaggeration=0.5, cfg_weight=0.5) work well for most prompts across all languages.
# Recommended defaults
wav = model.generate(
    text,
    language_id="fr",
    exaggeration=0.5,  # Default
    cfg_weight=0.5      # Default
)

Fast Speaking Reference

If your reference speaker has a fast speaking style, lower cfg_weight:
# For fast-speaking reference audio
wav = model.generate(
    text,
    language_id="es",
    audio_prompt_path="fast_speaker.wav",
    cfg_weight=0.3  # Improves pacing
)

Matching Reference Language

For best results, match the reference audio language to your target language:
# Best: French reference for French synthesis
wav = model.generate(
    "Bonjour!",
    language_id="fr",
    audio_prompt_path="french_speaker.wav"
)

# Acceptable: English reference for French (may have accent)
wav = model.generate(
    "Bonjour!",
    language_id="fr",
    audio_prompt_path="english_speaker.wav",
    cfg_weight=0.0  # Reduce accent
)

Complete Multilingual Example

import torchaudio as ta
import torch
from chatterbox.mtl_tts import ChatterboxMultilingualTTS

# Auto-detect device
if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"
else:
    device = "cpu"

print(f"Using device: {device}")

# Initialize model
model = ChatterboxMultilingualTTS.from_pretrained(device=device)

# Generate in multiple languages
languages = [
    ("Hello, how are you today?", "en", "english.wav"),
    ("Bonjour, comment ça va?", "fr", "french.wav"),
    ("Hola, ¿cómo estás?", "es", "spanish.wav"),
    ("你好,今天天气真不错。", "zh", "chinese.wav"),
]

for text, lang_id, output_file in languages:
    print(f"Generating {lang_id}...")
    wav = model.generate(text, language_id=lang_id)
    ta.save(output_file, wav, model.sr)
    print(f"Saved {output_file}")

Model Selection

Choose the right model for your needs:
Use CaseRecommended Model
English onlyChatterbox or Chatterbox Turbo
Multiple languagesChatterbox Multilingual
English with paralinguistic tagsChatterbox Turbo
Global applicationsChatterbox Multilingual
Low-latency voice agents (English)Chatterbox Turbo

Punctuation Handling

The multilingual model handles language-specific punctuation:
# Chinese punctuation (from mtl_tts.py:86)
sentence_enders = {".", "!", "?", "-", ",", "、", ",", "。", "?", "!"}
The model automatically normalizes punctuation for each language.

Advanced Configuration

Full Parameter Example

wav = model.generate(
    text="Votre texte ici",
    language_id="fr",
    audio_prompt_path="reference.wav",
    exaggeration=0.5,      # Expressiveness
    cfg_weight=0.5,        # Conditioning strength
    temperature=0.8,       # Sampling randomness
    repetition_penalty=2.0, # Reduce repetition
    min_p=0.05,           # Min probability threshold
    top_p=1.0             # Nucleus sampling
)
See the Configuration guide for detailed parameter explanations.

Troubleshooting

Unsupported Language Error

# Error: ValueError: Unsupported language_id 'xx'
languages = ChatterboxMultilingualTTS.get_supported_languages()
print(f"Supported: {list(languages.keys())}")

Poor Pronunciation

  1. Ensure correct language_id - Verify you’re using the right code
  2. Check text encoding - Use UTF-8 for non-Latin scripts
  3. Adjust cfg_weight - Try cfg_weight=0.0 for more native pronunciation
  4. Use native reference audio - Match reference language to target language

Strong Accent

If the accent from reference audio is too strong:
# Minimize accent influence
wav = model.generate(
    text,
    language_id="target_language",
    audio_prompt_path="reference.wav",
    cfg_weight=0.0  # Removes most accent influence
)

Build docs developers (and LLMs) love