Skip to main content

Class

Qwen3TTSModel is a HuggingFace-style wrapper for Qwen3-TTS models that provides:
  • from_pretrained() initialization via AutoModel/AutoProcessor
  • Generation APIs for CustomVoice, VoiceDesign, and Base models
  • Consistent output format: (wavs: List[np.ndarray], sample_rate: int)
  • Language and speaker validation
Source: qwen_tts/inference/qwen3_tts_model.py:54-878

Constructor

Qwen3TTSModel(
    model: Qwen3TTSForConditionalGeneration,
    processor: Qwen3TTSProcessor,
    generate_defaults: Optional[Dict[str, Any]] = None
)
The constructor is typically not called directly. Use from_pretrained() instead.
model
Qwen3TTSForConditionalGeneration
required
The underlying HuggingFace TTS model instance.
processor
Qwen3TTSProcessor
required
The model’s text processor.
generate_defaults
Optional[Dict[str, Any]]
default:"None"
Default generation parameters loaded from generate_config.json.

Class Methods

from_pretrained

@classmethod
Qwen3TTSModel.from_pretrained(
    pretrained_model_name_or_path: str,
    **kwargs
) -> Qwen3TTSModel
Load a Qwen3-TTS model and its processor in HuggingFace from_pretrained style. This method:
  1. Loads the config via AutoConfig (registers qwen3_tts model type)
  2. Loads the model via AutoModel.from_pretrained(...), forwarding kwargs unchanged
  3. Loads the processor via AutoProcessor.from_pretrained(...)
  4. Loads optional generate_config.json from the model directory if present
Source: qwen_tts/inference/qwen3_tts_model.py:82-121

Parameters

pretrained_model_name_or_path
str
required
HuggingFace repository ID or local directory path containing the model.Examples:
  • "Qwen3-TTS-CustomVoice-2B"
  • "./local/model/path"
**kwargs
Any
Forwarded as-is into AutoModel.from_pretrained(...).Common examples:
  • device_map="cuda:0" - Load model on specific GPU
  • torch_dtype=torch.bfloat16 - Use bfloat16 precision
  • attn_implementation="flash_attention_2" - Use FlashAttention 2
  • trust_remote_code=True - Trust remote code (if required)

Returns

Qwen3TTSModel
Qwen3TTSModel
Wrapper instance containing the loaded model, processor, and generation defaults.

Example

from qwen_tts import Qwen3TTSModel
import torch

# Basic loading
model = Qwen3TTSModel.from_pretrained("Qwen3-TTS-CustomVoice-2B")

# Load with specific device and dtype
model = Qwen3TTSModel.from_pretrained(
    "Qwen3-TTS-CustomVoice-2B",
    device_map="cuda:0",
    torch_dtype=torch.bfloat16
)

# Load from local directory
model = Qwen3TTSModel.from_pretrained(
    "./my_local_model",
    device_map="auto"
)

Instance Methods

get_supported_languages

get_supported_languages() -> Optional[List[str]]
List supported language names for the current model. This is a convenience wrapper around model.get_supported_languages(). If the underlying model does not expose language constraints (returns None), this method also returns None. Source: qwen_tts/inference/qwen3_tts_model.py:861-877

Returns

languages
Optional[List[str]]
  • A sorted list of supported language names (lowercased), if available
  • None if the model does not provide supported languages

Example

languages = model.get_supported_languages()
if languages:
    print(f"Supported languages: {languages}")
else:
    print("All languages supported")

get_supported_speakers

get_supported_speakers() -> Optional[List[str]]
List supported speaker names for the current model. This is a convenience wrapper around model.get_supported_speakers(). If the underlying model does not expose speaker constraints (returns None), this method also returns None. Note: This is primarily used with CustomVoice models. Source: qwen_tts/inference/qwen3_tts_model.py:842-858

Returns

speakers
Optional[List[str]]
  • A sorted list of supported speaker names (lowercased), if available
  • None if the model does not provide supported speakers

Example

speakers = model.get_supported_speakers()
if speakers:
    print(f"Available speakers: {speakers}")
    # Use one of the speakers
    wavs, sr = model.generate_custom_voice(
        text="Hello",
        speaker=speakers[0],
        language="English"
    )

Generation Methods

The following generation methods are available on Qwen3TTSModel instances: See Generation Methods and Voice Clone Prompt for complete documentation.

Attributes

model
Qwen3TTSForConditionalGeneration
The underlying HuggingFace conditional generation model.
processor
Qwen3TTSProcessor
The text processor for tokenization.
generate_defaults
Dict[str, Any]
Default generation parameters loaded from generate_config.json.
device
torch.device
The device where the model is loaded (e.g., cuda:0, cpu).

See Also

Build docs developers (and LLMs) love