Skip to main content
The Tokenizer API provides a unified interface for encoding audio into discrete codes and decoding them back to waveforms. It supports both 25Hz and 12Hz tokenizer models.

Qwen3TTSTokenizer Class

The Qwen3TTSTokenizer class is a wrapper for Qwen3 TTS Tokenizers with HuggingFace-style loading and inference. It provides:
  • HuggingFace Integration: Load models using from_pretrained()
  • Flexible Input Formats: Support for file paths, URLs, base64 strings, and numpy arrays
  • Batch Processing: Encode and decode multiple audio files simultaneously
  • Automatic Resampling: Handles sample rate conversion automatically
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:44

Key Features

Supported Input Formats

The tokenizer accepts audio in multiple formats:
  • File paths: Local WAV files
  • URLs: HTTP/HTTPS audio URLs
  • Base64 strings: Encoded audio data (with or without data URL prefix)
  • NumPy arrays: Raw waveform arrays (requires sr parameter)
  • Batch inputs: Lists of any of the above formats

Model Variants

  • 25Hz Tokenizer (Qwen/Qwen3-TTS-Tokenizer-25Hz): Returns audio codes, x-vectors, and reference mel-spectrograms
  • 12Hz Tokenizer (Qwen/Qwen3-TTS-Tokenizer-12Hz): Returns multi-quantizer audio codes

Return Types

For numpy array input, you must pass the sr parameter to specify the original sampling rate.

Quick Navigation

Qwen3TTSTokenizer

Class initialization and loading methods

Encode & Decode

Encoding audio to codes and decoding back to waveforms

Basic Usage

from qwen_tts import Qwen3TTSTokenizer

# Load tokenizer
tokenizer = Qwen3TTSTokenizer.from_pretrained(
    "Qwen/Qwen3-TTS-Tokenizer-12Hz",
    device_map="cuda:0"
)

# Encode audio
encoded = tokenizer.encode("audio.wav")

# Decode back to waveform
wavs, sample_rate = tokenizer.decode(encoded)

Build docs developers (and LLMs) love