Skip to main content

Latest Release

January 22, 2026 - Qwen3-TTS Initial Release

We are excited to announce the initial release of Qwen3-TTS, a series of powerful speech generation models developed by the Qwen team.

Released Models

All models are based on the Qwen3-TTS-Tokenizer-12Hz:

Qwen3-TTS-12Hz-1.7B-CustomVoice

1.7B parameter model with 9 premium speakers and instruction control

Qwen3-TTS-12Hz-1.7B-VoiceDesign

1.7B parameter model with voice design from natural language descriptions

Qwen3-TTS-12Hz-1.7B-Base

1.7B parameter base model with 3-second voice cloning capability

Qwen3-TTS-12Hz-0.6B-CustomVoice

0.6B parameter efficient model with 9 premium speakers

Qwen3-TTS-12Hz-0.6B-Base

0.6B parameter efficient base model with voice cloning

Qwen3-TTS-Tokenizer-12Hz

High-fidelity 12Hz speech tokenizer for encoding and decoding

Key Features

  • Self-developed Qwen3-TTS-Tokenizer-12Hz
  • Efficient acoustic compression and high-dimensional semantic modeling
  • Preserves paralinguistic information and acoustic environmental features
  • High-speed, high-fidelity speech reconstruction
  • Discrete multi-codebook LM architecture
  • Full-information end-to-end speech modeling
  • Eliminates information bottlenecks of traditional LM+DiT schemes
  • Enhanced versatility, generation efficiency, and performance ceiling
  • Dual-Track hybrid streaming generation architecture
  • Single model supports both streaming and non-streaming
  • First audio packet after single character input
  • End-to-end synthesis latency as low as 97ms
  • Natural language instruction-based voice control
  • Flexible control over timbre, emotion, and prosody
  • Deep text semantic understanding
  • Adaptive tone, rhythm, and emotional expression
Supports 10 major languages:
  • Chinese (Mandarin)
  • English
  • Japanese
  • Korean
  • German
  • French
  • Russian
  • Portuguese
  • Spanish
  • Italian
Plus Chinese dialects: Beijing, Sichuan

Premium Speakers

CustomVoice models include 9 carefully crafted speakers:
SpeakerVoice DescriptionNative Language
VivianBright, slightly edgy young female voiceChinese
SerenaWarm, gentle young female voiceChinese
Uncle_FuSeasoned male voice with low, mellow timbreChinese
DylanYouthful Beijing male voice, clear and naturalChinese (Beijing)
EricLively Chengdu male voice, husky brightnessChinese (Sichuan)
RyanDynamic male voice with strong rhythmic driveEnglish
AidenSunny American male voice, clear midrangeEnglish
Ono_AnnaPlayful Japanese female voice, light and nimbleJapanese
SoheeWarm Korean female voice with rich emotionKorean

Performance Highlights

  • Best-in-class English synthesis: WER of 1.24 on Seed-TTS test-en
  • Competitive Chinese synthesis: WER of 0.77 on Seed-TTS test-zh
  • State-of-the-art voice design: Leading performance on InstructTTSEval
  • Strong cross-lingual: Excellent results on language transfer tasks
  • Multilingual excellence: Competitive performance across all 10 languages
See the Benchmarks page for detailed results.

Resources

Technical Paper

Read the research paper on arXiv

Blog Post

Detailed introduction and use cases

Hugging Face

Download models from Hugging Face

ModelScope

Download models from ModelScope (CN)

GitHub Repository

Source code and examples

Live Demo (HF)

Try Qwen3-TTS in your browser

Installation

# Install via PyPI
pip install qwen-tts

# Or install from source
git clone https://github.com/QwenLM/Qwen3-TTS.git
cd Qwen3-TTS
pip install -e .

# Optional: Install FlashAttention for better performance
pip install flash-attn --no-build-isolation

Quick Start

import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel

# Load CustomVoice model
model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

# Generate speech
wavs, sr = model.generate_custom_voice(
    text="Hello! Welcome to Qwen3-TTS.",
    language="English",
    speaker="Ryan",
    instruct="Say it with enthusiasm",
)

sf.write("output.wav", wavs[0], sr)

What’s Coming

Additional models mentioned in the technical report will be released in the near future. Stay tuned!

Version History

qwen-tts Package

v0.1.1 - Current Release

Package Information:
  • Python support: 3.9, 3.10, 3.11, 3.12, 3.13
  • License: Apache-2.0
  • Command-line tool: qwen-tts-demo
Dependencies:
  • transformers 4.57.3
  • accelerate 1.12.0
  • gradio (latest)
  • librosa, torchaudio, soundfile
  • onnxruntime, einops

DashScope API

Production-ready API access for Qwen3-TTS models:

CustomVoice API

Real-time streaming API

Voice Clone API

Voice cloning API

Voice Design API

Voice design API

vLLM-Omni Support

vLLM officially provides day-0 support for Qwen3-TTS:
  • Optimized inference engine
  • Better throughput and latency
  • Offline inference (online serving coming soon)
  • See vLLM-Omni documentation

Stay Updated

GitHub Releases

Watch for new releases

Discord Community

Join discussions

WeChat Group

Chinese community
Subscribe to the GitHub repository to get notified about new releases and updates.

Build docs developers (and LLMs) love