Latest Release
January 22, 2026 - Qwen3-TTS Initial Release
We are excited to announce the initial release of Qwen3-TTS, a series of powerful speech generation models developed by the Qwen team.
Released Models
All models are based on the Qwen3-TTS-Tokenizer-12Hz:Qwen3-TTS-12Hz-1.7B-CustomVoice
1.7B parameter model with 9 premium speakers and instruction control
Qwen3-TTS-12Hz-1.7B-VoiceDesign
1.7B parameter model with voice design from natural language descriptions
Qwen3-TTS-12Hz-1.7B-Base
1.7B parameter base model with 3-second voice cloning capability
Qwen3-TTS-12Hz-0.6B-CustomVoice
0.6B parameter efficient model with 9 premium speakers
Qwen3-TTS-12Hz-0.6B-Base
0.6B parameter efficient base model with voice cloning
Qwen3-TTS-Tokenizer-12Hz
High-fidelity 12Hz speech tokenizer for encoding and decoding
Key Features
Powerful Speech Representation
Powerful Speech Representation
- Self-developed Qwen3-TTS-Tokenizer-12Hz
- Efficient acoustic compression and high-dimensional semantic modeling
- Preserves paralinguistic information and acoustic environmental features
- High-speed, high-fidelity speech reconstruction
Universal End-to-End Architecture
Universal End-to-End Architecture
- Discrete multi-codebook LM architecture
- Full-information end-to-end speech modeling
- Eliminates information bottlenecks of traditional LM+DiT schemes
- Enhanced versatility, generation efficiency, and performance ceiling
Extreme Low-Latency Streaming
Extreme Low-Latency Streaming
- Dual-Track hybrid streaming generation architecture
- Single model supports both streaming and non-streaming
- First audio packet after single character input
- End-to-end synthesis latency as low as 97ms
Intelligent Voice Control
Intelligent Voice Control
- Natural language instruction-based voice control
- Flexible control over timbre, emotion, and prosody
- Deep text semantic understanding
- Adaptive tone, rhythm, and emotional expression
Comprehensive Language Support
Comprehensive Language Support
Supports 10 major languages:
- Chinese (Mandarin)
- English
- Japanese
- Korean
- German
- French
- Russian
- Portuguese
- Spanish
- Italian
Premium Speakers
CustomVoice models include 9 carefully crafted speakers:| Speaker | Voice Description | Native Language |
|---|---|---|
| Vivian | Bright, slightly edgy young female voice | Chinese |
| Serena | Warm, gentle young female voice | Chinese |
| Uncle_Fu | Seasoned male voice with low, mellow timbre | Chinese |
| Dylan | Youthful Beijing male voice, clear and natural | Chinese (Beijing) |
| Eric | Lively Chengdu male voice, husky brightness | Chinese (Sichuan) |
| Ryan | Dynamic male voice with strong rhythmic drive | English |
| Aiden | Sunny American male voice, clear midrange | English |
| Ono_Anna | Playful Japanese female voice, light and nimble | Japanese |
| Sohee | Warm Korean female voice with rich emotion | Korean |
Performance Highlights
- Best-in-class English synthesis: WER of 1.24 on Seed-TTS test-en
- Competitive Chinese synthesis: WER of 0.77 on Seed-TTS test-zh
- State-of-the-art voice design: Leading performance on InstructTTSEval
- Strong cross-lingual: Excellent results on language transfer tasks
- Multilingual excellence: Competitive performance across all 10 languages
Resources
Technical Paper
Read the research paper on arXiv
Blog Post
Detailed introduction and use cases
Hugging Face
Download models from Hugging Face
ModelScope
Download models from ModelScope (CN)
GitHub Repository
Source code and examples
Live Demo (HF)
Try Qwen3-TTS in your browser
Installation
Quick Start
What’s Coming
Additional models mentioned in the technical report will be released in the near future. Stay tuned!
Version History
qwen-tts Package
v0.1.1 - Current Release
Package Information:- Python support: 3.9, 3.10, 3.11, 3.12, 3.13
- License: Apache-2.0
- Command-line tool:
qwen-tts-demo
- transformers 4.57.3
- accelerate 1.12.0
- gradio (latest)
- librosa, torchaudio, soundfile
- onnxruntime, einops
Related Releases
DashScope API
Production-ready API access for Qwen3-TTS models:CustomVoice API
Real-time streaming API
Voice Clone API
Voice cloning API
Voice Design API
Voice design API
vLLM-Omni Support
vLLM officially provides day-0 support for Qwen3-TTS:- Optimized inference engine
- Better throughput and latency
- Offline inference (online serving coming soon)
- See vLLM-Omni documentation
Stay Updated
GitHub Releases
Watch for new releases
Discord Community
Join discussions
WeChat Group
Chinese community
Subscribe to the GitHub repository to get notified about new releases and updates.