Qwen3TTSTokenizer Class
TheQwen3TTSTokenizer class is a wrapper for Qwen3 TTS Tokenizers with HuggingFace-style loading and inference. It provides:
- HuggingFace Integration: Load models using
from_pretrained() - Flexible Input Formats: Support for file paths, URLs, base64 strings, and numpy arrays
- Batch Processing: Encode and decode multiple audio files simultaneously
- Automatic Resampling: Handles sample rate conversion automatically
Defined in
qwen_tts/inference/qwen3_tts_tokenizer.py:44Key Features
Supported Input Formats
The tokenizer accepts audio in multiple formats:- File paths: Local WAV files
- URLs: HTTP/HTTPS audio URLs
- Base64 strings: Encoded audio data (with or without data URL prefix)
- NumPy arrays: Raw waveform arrays (requires
srparameter) - Batch inputs: Lists of any of the above formats
Model Variants
- 25Hz Tokenizer (
Qwen/Qwen3-TTS-Tokenizer-25Hz): Returns audio codes, x-vectors, and reference mel-spectrograms - 12Hz Tokenizer (
Qwen/Qwen3-TTS-Tokenizer-12Hz): Returns multi-quantizer audio codes
Return Types
Quick Navigation
Qwen3TTSTokenizer
Class initialization and loading methods
Encode & Decode
Encoding audio to codes and decoding back to waveforms