Code Examples

Overview

The Qwen3-TTS repository includes comprehensive example scripts demonstrating various use cases and model types. All examples are available in the examples/ directory on GitHub.

Available Examples

CustomVoice Model

Test CustomVoice model with 9 premium speakers and instruction control

VoiceDesign Model

Generate voices from natural language descriptions

Base Model (Voice Clone)

3-second voice cloning with reference audio

Tokenizer Usage

Encode and decode audio with Qwen3-TTS-Tokenizer

Example Details

test_model_12hz_custom_voice.py

Demonstrates usage of the CustomVoice model with predefined speakers. Features:

Single and batch inference
Language selection (Chinese, English, etc.)
Speaker selection from 9 premium voices
Instruction-based control (tone, emotion, style)

Key code snippet:

from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

wavs, sr = model.generate_custom_voice(
    text="其实我真的有发现，我是一个特别善于观察别人情绪的人。",
    language="Chinese",
    speaker="Vivian",
    instruct="用特别愤怒的语气说",
)

View full example →

test_model_12hz_voice_design.py

Shows how to design custom voices using natural language descriptions. Features:

Voice creation from text descriptions
Single and batch generation
Multilingual voice design
Emotional and stylistic control

Key code snippet:

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

wavs, sr = model.generate_voice_design(
    text="哥哥，你回来啦，人家等了你好久好久了，要抱抱！",
    language="Chinese",
    instruct="体现撒娇稚嫩的萝莉女声，音调偏高且起伏明显，营造出黏人、做作又刻意卖萌的听觉效果。",
)

View full example →

test_model_12hz_base.py

Comprehensive voice cloning examples with the Base model. Features:

Voice cloning from reference audio
Single and batch voice cloning
Reusable voice clone prompts
X-vector only mode
Multiple clone modes (ICL and x-vector)

Key code snippet:

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-Base",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

ref_audio = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone.wav"
ref_text = "Okay. Yeah. I resent you. I love you. I respect you..."

wavs, sr = model.generate_voice_clone(
    text="I am solving the equation: x = [-b ± √(b²-4ac)] / 2a?",
    language="English",
    ref_audio=ref_audio,
    ref_text=ref_text,
)

View full example →

test_tokenizer_12hz.py

Demonstrates audio encoding and decoding with the tokenizer. Features:

Single and batch audio encoding
Audio decoding from codes
Multiple input formats (URLs, paths, numpy arrays)
Dict and list payload handling

Key code snippet:

from qwen_tts import Qwen3TTSTokenizer

tokenizer = Qwen3TTSTokenizer.from_pretrained(
    "Qwen/Qwen3-TTS-Tokenizer-12Hz",
    device_map="cuda:0",
)

# Encode from URL
audio_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/tokenizer_demo_1.wav"
enc = tokenizer.encode(audio_url)

# Decode back to audio
wavs, sr = tokenizer.decode(enc)

View full example →

Common Patterns

Model Initialization

All examples use a consistent pattern for loading models:

import torch
from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    "Qwen/MODEL_NAME",
    device_map="cuda:0",              # GPU device
    dtype=torch.bfloat16,             # Recommended dtype
    attn_implementation="flash_attention_2",  # Optional but recommended
)

Batch Processing

All generation methods support batch inference:

wavs, sr = model.generate_custom_voice(
    text=["First sentence.", "Second sentence."],
    language=["English", "English"],
    speaker=["Ryan", "Aiden"],
    instruct=["", "Very happy."]
)

Saving Output

All examples use soundfile for saving audio:

import soundfile as sf

sf.write("output.wav", wavs[0], sr)

Running Examples

Prerequisites

pip install qwen-tts
pip install flash-attn --no-build-isolation  # Optional but recommended

Clone Repository

git clone https://github.com/QwenLM/Qwen3-TTS.git
cd Qwen3-TTS/examples

Run Examples

# CustomVoice example
python test_model_12hz_custom_voice.py

# VoiceDesign example
python test_model_12hz_voice_design.py

# Base model (voice cloning)
python test_model_12hz_base.py

# Tokenizer example
python test_tokenizer_12hz.py

Examples will automatically download model weights on first run. Ensure you have sufficient disk space and a stable internet connection.

Advanced Usage

For more advanced usage patterns, including:

Voice design then clone workflow
Reusable voice prompts
Custom generation parameters
Streaming generation

Refer to the API Reference and Quickstart Guide.

Troubleshooting

CUDA out of memory

Try reducing batch size or using a smaller model variant (0.6B instead of 1.7B).

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice",  # Smaller model
    device_map="cuda:0",
    dtype=torch.float16,  # Use fp16 instead of bf16
)

FlashAttention installation fails

FlashAttention is optional but improves performance. If installation fails:

# Load without FlashAttention
model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    # attn_implementation="flash_attention_2",  # Omit this line
)

Model download is slow

Use ModelScope for faster downloads in Mainland China:

pip install modelscope
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
  --local_dir ./models/Qwen3-TTS-12Hz-1.7B-CustomVoice

Then load from local path:

model = Qwen3TTSModel.from_pretrained(
    "./models/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    ...
)

Resources

Overview

Available Examples

CustomVoice Model

VoiceDesign Model

Base Model (Voice Clone)

Tokenizer Usage

Example Details

test_model_12hz_custom_voice.py

test_model_12hz_voice_design.py

test_model_12hz_base.py

test_tokenizer_12hz.py

Common Patterns

Model Initialization

Batch Processing

Saving Output

Running Examples

Prerequisites

Clone Repository

Run Examples

Advanced Usage

Troubleshooting

Build docs developers (and LLMs) love

Resources

​Overview

​Available Examples

CustomVoice Model

VoiceDesign Model

Base Model (Voice Clone)

Tokenizer Usage

​Example Details

​test_model_12hz_custom_voice.py

​test_model_12hz_voice_design.py

​test_model_12hz_base.py

​test_tokenizer_12hz.py

​Common Patterns

​Model Initialization

​Batch Processing

​Saving Output

​Running Examples

​Prerequisites

​Clone Repository

​Run Examples

​Advanced Usage

​Troubleshooting

​Related Resources

Build docs developers (and LLMs) love

Overview

Available Examples

Example Details

test_model_12hz_custom_voice.py

test_model_12hz_voice_design.py

test_model_12hz_base.py

test_tokenizer_12hz.py

Common Patterns

Model Initialization

Batch Processing

Saving Output

Running Examples

Prerequisites

Clone Repository

Run Examples

Advanced Usage

Troubleshooting

Related Resources