Overview
Qwen3-TTS provides comprehensive multilingual support, covering 10 major languages with native-quality synthesis. The models are trained on diverse multilingual data and support cross-lingual voice cloning and generation.Supported Languages
Chinese
普通话, Beijing Dialect, Sichuan Dialect
English
American, British, and neutral accents
Japanese
Standard Japanese (標準語)
Korean
Standard Korean (표준어)
German
Standard German (Hochdeutsch)
French
European French (français européen)
Russian
Standard Russian (русский)
Portuguese
Brazilian and European Portuguese
Spanish
European and Latin American Spanish
Italian
Standard Italian (italiano)
Language Quality Comparison
Content Consistency (WER/CER ↓)
Word Error Rate (WER) or Character Error Rate (CER) on multilingual test set - lower is better:| Language | 1.7B-Base | 0.6B-Base | Quality Tier |
|---|---|---|---|
| Chinese | 0.928 | 1.145 | ⭐⭐⭐ Excellent |
| English | 0.934 | 0.836 | ⭐⭐⭐ Excellent |
| Korean | 1.755 | 1.741 | ⭐⭐⭐ Excellent |
| German | 1.235 | 1.089 | ⭐⭐ Very Good |
| Italian | 0.948 | 1.534 | ⭐⭐ Very Good |
| Portuguese | 1.526 | 2.254 | ⭐⭐ Very Good |
| Spanish | 1.126 | 1.491 | ⭐⭐ Very Good |
| French | 2.858 | 2.931 | ⭐ Good |
| Russian | 3.212 | 4.458 | ⭐ Good |
| Japanese | 3.823 | 6.404 | ⭐ Good |
Chinese, English, and Korean achieve the best content accuracy, making them ideal for production applications requiring high precision.
Speaker Similarity (Cosine Similarity ↑)
Speaker embedding similarity on voice cloning tasks - higher is better:| Language | 1.7B-Base | 0.6B-Base | Quality Tier |
|---|---|---|---|
| English | 0.775 | 0.829 | ⭐⭐⭐ Excellent |
| Portuguese | 0.817 | 0.794 | ⭐⭐⭐ Excellent |
| Spanish | 0.814 | 0.812 | ⭐⭐⭐ Excellent |
| Italian | 0.817 | 0.792 | ⭐⭐⭐ Excellent |
| Chinese | 0.799 | 0.811 | ⭐⭐⭐ Excellent |
| Korean | 0.799 | 0.812 | ⭐⭐⭐ Excellent |
| Russian | 0.792 | 0.781 | ⭐⭐⭐ Excellent |
| Japanese | 0.788 | 0.798 | ⭐⭐ Very Good |
| German | 0.775 | 0.769 | ⭐⭐ Very Good |
| French | 0.714 | 0.700 | ⭐⭐ Very Good |
All languages achieve strong speaker similarity (>0.70), indicating excellent voice cloning capabilities across the board.
Speaker Native Languages
For CustomVoice models, the following 9 premium speakers are available:| Speaker | Voice Description | Native Language | Recommended Languages |
|---|---|---|---|
| Vivian | Bright, slightly edgy young female | Chinese | Chinese, English |
| Serena | Warm, gentle young female | Chinese | Chinese, English |
| Uncle_Fu | Seasoned male, low mellow timbre | Chinese | Chinese |
| Dylan | Youthful Beijing male, clear natural | Chinese (Beijing) | Chinese, English |
| Eric | Lively Chengdu male, slightly husky | Chinese (Sichuan) | Chinese |
| Ryan | Dynamic male, strong rhythmic drive | English | English, Chinese |
| Aiden | Sunny American male, clear midrange | English | English |
| Ono_Anna | Playful Japanese female, light nimble | Japanese | Japanese, English |
| Sohee | Warm Korean female, rich emotion | Korean | Korean, English |
Cross-Lingual Capabilities
Qwen3-TTS supports cross-lingual voice cloning, allowing you to clone a voice in one language and generate speech in another.Cross-Lingual Performance
Mixed Error Rate (WER for English, CER for others) on cross-lingual benchmark - lower is better:| Task | 1.7B-Base | 0.6B-Base | Quality |
|---|---|---|---|
| Korean → English | 3.09 | 3.48 | ⭐⭐⭐ Excellent |
| Japanese → English | 3.04 | 3.95 | ⭐⭐⭐ Excellent |
| English → Chinese | 4.77 | 5.66 | ⭐⭐ Very Good |
| Korean → Japanese | 3.67 | 4.17 | ⭐⭐ Very Good |
| Korean → Chinese | 4.82 | 8.12 | ⭐⭐ Very Good |
| English → Korean | 5.14 | 6.83 | ⭐⭐ Very Good |
| Japanese → Korean | 5.59 | 6.86 | ⭐⭐ Very Good |
| English → Japanese | 7.21 | 7.74 | ⭐ Good |
| Chinese → Japanese | 8.40 | 9.29 | ⭐ Good |
The 1.7B model generally performs better on cross-lingual tasks, especially for Korean and Japanese target languages.
Cross-Lingual Example
Language-Specific Considerations
Chinese (中文)
Chinese (中文)
Strengths:
- Excellent accuracy (WER ~0.93 for 1.7B)
- Strong dialect support (Beijing, Sichuan)
- Native speakers available (Vivian, Serena, Uncle_Fu, Dylan, Eric)
- Tone accuracy is critical; may occasionally flatten in complex prosody
- Text input should use simplified or traditional Chinese consistently
- Pinyin input not officially supported
English
English
Strengths:
- Excellent accuracy (WER ~0.93 for 1.7B)
- Multiple native speakers (Ryan, Aiden)
- Strong cross-lingual source language
- Accents: models default to neutral/American accent
- British English: supported but may sound slightly American-influenced
- Contractions and informal speech handled well
Japanese (日本語)
Japanese (日本語)
Strengths:
- Native speaker available (Ono_Anna)
- Good speaker similarity (0.788)
- Higher character error rate (~3.8-6.4%)
- Pitch accent may not always be perfect
- Kanji, hiragana, and katakana all supported
Korean (한국어)
Korean (한국어)
Strengths:
- Excellent accuracy (WER ~1.75)
- Native speaker available (Sohee)
- Strong cross-lingual capabilities
- Hangul input only (no romanization)
- Handles formal and informal speech
German (Deutsch)
German (Deutsch)
Strengths:
- Good accuracy (WER ~1.09-1.24)
- Good speaker similarity (0.77)
- Compound words handled well
- Umlauts (ä, ö, ü) supported
- May occasionally anglicize pronunciation
French (Français)
French (Français)
Strengths:
- Handles liaison and elision
- Reasonable accuracy for European languages
- Moderate error rate (~2.86-2.93)
- Nasal vowels may be approximated
- Accents (é, è, ê, etc.) should be included
Russian (Русский)
Russian (Русский)
Strengths:
- Strong speaker similarity (0.79)
- Handles Cyrillic script
- Moderate error rate (~3.2-4.5)
- Stress patterns may not always be perfect
- Cyrillic input required
Portuguese (Português)
Portuguese (Português)
Strengths:
- Excellent speaker similarity (0.817)
- Supports both Brazilian and European variants
- Error rate ~1.5-2.3
- Diacritics (ã, õ, ç) should be included
- May default to Brazilian pronunciation
Spanish (Español)
Spanish (Español)
Strengths:
- Excellent speaker similarity (0.814)
- Good accuracy (WER ~1.13-1.49)
- Supports European and Latin American variants
- Accent marks (á, é, í, ó, ú) should be included
- ñ character supported
- May default to Castilian pronunciation
Italian (Italiano)
Italian (Italiano)
Strengths:
- Excellent speaker similarity (0.817)
- Good accuracy (WER ~0.95-1.53)
- Handles double consonants well
- Accent marks (à, è, é, ì, ò, ù) should be included
- Regional accents not explicitly supported
Automatic Language Detection
Qwen3-TTS supports automatic language detection whenlanguage="Auto" is specified:
Multilingual Generation Tips
Choose the Right Speaker
Use native speakers for best quality:
- Chinese text → Vivian, Serena, Uncle_Fu, Dylan, Eric
- English text → Ryan, Aiden
- Japanese text → Ono_Anna
- Korean text → Sohee
Language Roadmap
Future language support (tentative):
- Arabic (العربية)
- Hindi (हिन्दी)
- Turkish (Türkçe)
- Vietnamese (Tiếng Việt)
- Thai (ไทย)
Next Steps
Voice Cloning
Learn how to clone voices across languages
Custom Voice
Use premium speakers in different languages
Voice Design
Create voices with language-specific characteristics
Examples
See multilingual code examples