Overview
Chatterbox-Multilingual extends the capabilities of the base Chatterbox model to support 23+ languages, making it ideal for global applications and localization projects. With 500M parameters, it maintains high-quality voice cloning while providing multilingual support.23+ Languages
Support for major world languages including European, Asian, and Middle Eastern languages.
Zero-Shot Cloning
Clone voices across languages without fine-tuning or training.
CFG Control
Same advanced controls as the base model for fine-tuning output.
Cross-Language
Transfer voices across different languages while preserving characteristics.
Model Specifications
- Model Size: 500M parameters
- Languages: 23+ supported languages
- Sample Rate: 24,000 Hz
- Architecture: T3 transformer (multilingual config) + S3Gen decoder
- Repository:
ResembleAI/chatterbox
Supported Languages
The multilingual model supports the following 23 languages:European Languages
European Languages
- Danish (da)
- German (de)
- Greek (el)
- English (en)
- Spanish (es)
- Finnish (fi)
- French (fr)
- Italian (it)
- Dutch (nl)
- Norwegian (no)
- Polish (pl)
- Portuguese (pt)
- Russian (ru)
- Swedish (sv)
Asian Languages
Asian Languages
- Chinese (zh)
- Japanese (ja)
- Korean (ko)
- Hindi (hi)
- Malay (ms)
Middle Eastern & African Languages
Middle Eastern & African Languages
- Arabic (ar)
- Hebrew (he)
- Turkish (tr)
- Swahili (sw)
Hardware Requirements
Minimum (CPU)
- 6GB RAM
- CPU inference supported
- Slower generation times
Recommended (GPU)
- NVIDIA GPU with 6GB+ VRAM
- CUDA support
- Near real-time generation
The model also supports Apple Silicon (MPS) for Mac users with M1/M2/M3 chips.
Usage
Basic Generation
Multiple Languages
Voice Cloning Across Languages
Getting Supported Languages
Generation Parameters
| Parameter | Default | Description |
|---|---|---|
language_id | Required | Two-letter language code (e.g., “fr”, “es”, “zh”) |
temperature | 0.8 | Controls randomness in token selection |
top_p | 1.0 | Nucleus sampling threshold |
min_p | 0.05 | Minimum probability threshold |
repetition_penalty | 2.0 | Penalizes repeated tokens |
cfg_weight | 0.5 | Classifier-free guidance strength |
exaggeration | 0.5 | Emotional intensity level |
audio_prompt_path | None | Path to reference audio for voice cloning |
The
language_id parameter is required for the multilingual model. Use the two-letter ISO 639-1 language codes.Best Practices
Language-Specific Tips
Chinese (zh)
Chinese (zh)
The model handles Chinese characters and punctuation. Use appropriate Chinese punctuation marks like 。?!for best results.
Japanese (ja)
Japanese (ja)
Both Hiragana, Katakana, and Kanji are supported. The model automatically handles mixed scripts.
Arabic (ar)
Arabic (ar)
Right-to-left text is properly handled. Ensure your text includes appropriate diacritical marks for best pronunciation.
European Languages
European Languages
The model handles accented characters (é, ñ, ü, etc.) naturally. Include proper accents for accurate pronunciation.
Cross-Language Voice Cloning
- Matching Languages: For best results, use a reference clip in the same language as your target text
- Accent Transfer: If accent transfer occurs with cross-language cloning, set
cfg_weight=0 - Reference Quality: Use clear, noise-free reference audio for consistent results
- Default Settings: The default parameters (
exaggeration=0.5,cfg_weight=0.5) work well across all languages
Performance Characteristics
Generation Speed
Similar to base Chatterbox model with 10-step decoding. Speed varies slightly by language complexity.
Audio Quality
High-fidelity 24kHz output across all supported languages with natural prosody and intonation.
Built-in Watermarking
Every audio file generated by Chatterbox-Multilingual includes Resemble AI’s Perth (Perceptual Threshold) watermark - imperceptible neural watermarks that survive MP3 compression, audio editing, and common manipulations.Use Cases
- Global Applications: Build TTS systems for international audiences
- Localization: Create localized content in multiple languages
- Language Learning: Generate pronunciation examples in various languages
- Multilingual Voice Agents: Conversational AI that speaks multiple languages
- Content Translation: Convert written content to speech across languages
- Accessibility: Text-to-speech for global accessibility features
Language Support Details
The model includes comprehensive language support with proper handling of:- Language-specific punctuation
- Diacritical marks and accents
- Script systems (Latin, Cyrillic, Arabic, Chinese, Japanese, Korean)
- Language-appropriate prosody and intonation
- Cultural speech patterns
Comparison with Other Models
| Feature | Chatterbox | Chatterbox-Turbo | Chatterbox-Multilingual |
|---|---|---|---|
| Parameters | 500M | 350M | 500M |
| Languages | English | English | 23+ |
| CFG Control | Yes | No | Yes |
| Exaggeration | Yes | No | Yes |
| Speed | Medium | Fast | Medium |
| Best For | Creative control | Low latency | Multi-language |
Next Steps
Installation
Install Chatterbox and get started
API Reference
Explore all parameters and methods