Model API Overview

Introduction

The Model API provides the main interface for working with Qwen3-TTS models. The Qwen3TTSModel class wraps the underlying HuggingFace model and provides three generation methods for different use cases:

CustomVoice: Generate speech using predefined speaker IDs
VoiceDesign: Generate speech with natural-language style instructions
Base: Generate speech by cloning voices from reference audio

Main Components

Qwen3TTSModel

The primary class for model interaction. See Qwen3TTSModel for details.

from qwen_tts import Qwen3TTSModel

# Load a model
model = Qwen3TTSModel.from_pretrained(
    "Qwen3-TTS-CustomVoice-2B",
    device_map="cuda:0"
)

Model Loading

Load models using the from_pretrained() class method, which supports:

HuggingFace Hub repository IDs
Local model directories
Standard HuggingFace loading options (device_map, dtype, etc.)

See Qwen3TTSModel.from_pretrained() for all parameters.

Generation Methods

The Model API provides three generation methods, each designed for a specific model type:

generate_custom_voice

Generate speech using predefined speaker IDs with optional style instructions

generate_voice_design

Generate speech with natural-language voice descriptions

generate_voice_clone

Clone voices from reference audio samples

See Generation Methods for detailed parameter documentation.

Voice Cloning

For voice cloning workflows, use:

create_voice_clone_prompt() - Build reusable voice prompts from reference audio
VoiceClonePromptItem - Container for voice clone prompt data

See Voice Clone Prompt for usage details.

Utility Methods

Query model capabilities:

# Get supported languages
languages = model.get_supported_languages()

# Get supported speakers (CustomVoice models)
speakers = model.get_supported_speakers()

Quick Start

CustomVoice Model

from qwen_tts import Qwen3TTSModel

# Load model
model = Qwen3TTSModel.from_pretrained(
    "Qwen3-TTS-CustomVoice-2B",
    device_map="cuda:0"
)

# Generate speech
wavs, sr = model.generate_custom_voice(
    text="Hello, world!",
    speaker="aurora",
    language="English"
)

VoiceDesign Model

from qwen_tts import Qwen3TTSModel

# Load model
model = Qwen3TTSModel.from_pretrained(
    "Qwen3-TTS-VoiceDesign-2B",
    device_map="cuda:0"
)

# Generate speech
wavs, sr = model.generate_voice_design(
    text="Welcome to our service.",
    instruct="A professional female voice with a warm tone",
    language="English"
)

Base Model (Voice Cloning)

from qwen_tts import Qwen3TTSModel

# Load model
model = Qwen3TTSModel.from_pretrained(
    "Qwen3-TTS-Base-2B",
    device_map="cuda:0"
)

# Generate speech
wavs, sr = model.generate_voice_clone(
    text="This is a test.",
    ref_audio="reference.wav",
    ref_text="Reference audio transcription",
    language="English"
)

Next Steps

Qwen3TTSModel

Class documentation and initialization methods

Generation Methods

Complete parameter reference for all generation methods

Voice Clone Prompt

Voice cloning workflow and prompt management

Examples

Complete usage examples and tutorials

Model API

Tokenizer API

CLI

Model API Overview

Introduction

Main Components

Qwen3TTSModel

Model Loading

Generation Methods

generate_custom_voice

generate_voice_design

generate_voice_clone

Voice Cloning

Utility Methods

Quick Start

CustomVoice Model

VoiceDesign Model

Base Model (Voice Cloning)

Next Steps

Qwen3TTSModel

Generation Methods

Voice Clone Prompt

Examples

Build docs developers (and LLMs) love

Model API

Tokenizer API

CLI

​Introduction

​Main Components

​Qwen3TTSModel

​Model Loading

​Generation Methods

generate_custom_voice

generate_voice_design

generate_voice_clone

​Voice Cloning

​Utility Methods

​Quick Start

​CustomVoice Model

​VoiceDesign Model

​Base Model (Voice Cloning)

​Next Steps

Qwen3TTSModel

Generation Methods

Voice Clone Prompt

Examples

Build docs developers (and LLMs) love

Introduction

Main Components

Qwen3TTSModel

Model Loading

Generation Methods

Voice Cloning

Utility Methods

Quick Start

CustomVoice Model

VoiceDesign Model

Base Model (Voice Cloning)

Next Steps