Skip to main content

Introduction

The Model API provides the main interface for working with Qwen3-TTS models. The Qwen3TTSModel class wraps the underlying HuggingFace model and provides three generation methods for different use cases:
  • CustomVoice: Generate speech using predefined speaker IDs
  • VoiceDesign: Generate speech with natural-language style instructions
  • Base: Generate speech by cloning voices from reference audio

Main Components

Qwen3TTSModel

The primary class for model interaction. See Qwen3TTSModel for details.
from qwen_tts import Qwen3TTSModel

# Load a model
model = Qwen3TTSModel.from_pretrained(
    "Qwen3-TTS-CustomVoice-2B",
    device_map="cuda:0"
)

Model Loading

Load models using the from_pretrained() class method, which supports:
  • HuggingFace Hub repository IDs
  • Local model directories
  • Standard HuggingFace loading options (device_map, dtype, etc.)
See Qwen3TTSModel.from_pretrained() for all parameters.

Generation Methods

The Model API provides three generation methods, each designed for a specific model type:

generate_custom_voice

Generate speech using predefined speaker IDs with optional style instructions

generate_voice_design

Generate speech with natural-language voice descriptions

generate_voice_clone

Clone voices from reference audio samples
See Generation Methods for detailed parameter documentation.

Voice Cloning

For voice cloning workflows, use:
  • create_voice_clone_prompt() - Build reusable voice prompts from reference audio
  • VoiceClonePromptItem - Container for voice clone prompt data
See Voice Clone Prompt for usage details.

Utility Methods

Query model capabilities:
# Get supported languages
languages = model.get_supported_languages()

# Get supported speakers (CustomVoice models)
speakers = model.get_supported_speakers()

Quick Start

CustomVoice Model

from qwen_tts import Qwen3TTSModel

# Load model
model = Qwen3TTSModel.from_pretrained(
    "Qwen3-TTS-CustomVoice-2B",
    device_map="cuda:0"
)

# Generate speech
wavs, sr = model.generate_custom_voice(
    text="Hello, world!",
    speaker="aurora",
    language="English"
)

VoiceDesign Model

from qwen_tts import Qwen3TTSModel

# Load model
model = Qwen3TTSModel.from_pretrained(
    "Qwen3-TTS-VoiceDesign-2B",
    device_map="cuda:0"
)

# Generate speech
wavs, sr = model.generate_voice_design(
    text="Welcome to our service.",
    instruct="A professional female voice with a warm tone",
    language="English"
)

Base Model (Voice Cloning)

from qwen_tts import Qwen3TTSModel

# Load model
model = Qwen3TTSModel.from_pretrained(
    "Qwen3-TTS-Base-2B",
    device_map="cuda:0"
)

# Generate speech
wavs, sr = model.generate_voice_clone(
    text="This is a test.",
    ref_audio="reference.wav",
    ref_text="Reference audio transcription",
    language="English"
)

Next Steps

Qwen3TTSModel

Class documentation and initialization methods

Generation Methods

Complete parameter reference for all generation methods

Voice Clone Prompt

Voice cloning workflow and prompt management

Examples

Complete usage examples and tutorials

Build docs developers (and LLMs) love