Skip to main content

Class Definition

class Qwen3TTSTokenizer:
    def __init__(self):
        self.model = None
        self.feature_extractor = None
        self.config = None
        self.device = None
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:44-61
Do not instantiate directly. Use from_pretrained() instead.

Methods

from_pretrained()

Load a pretrained tokenizer model with HuggingFace-style initialization.
@classmethod
def from_pretrained(
    cls,
    pretrained_model_name_or_path: str,
    **kwargs
) -> "Qwen3TTSTokenizer"
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:63-99

Parameters

pretrained_model_name_or_path
str
required
HuggingFace model repository ID or local directory path.Examples:
  • "Qwen/Qwen3-TTS-Tokenizer-25Hz"
  • "Qwen/Qwen3-TTS-Tokenizer-12Hz"
  • "/path/to/local/model"
**kwargs
Any
Additional keyword arguments forwarded to AutoModel.from_pretrained().Common options:
  • device_map: Device placement (e.g., "cuda:0", "cpu", "auto")
  • torch_dtype: Model precision (e.g., torch.bfloat16, torch.float16)
  • attn_implementation: Attention implementation (e.g., "eager", "flash_attention_2")

Returns

tokenizer
Qwen3TTSTokenizer
Initialized tokenizer instance with loaded model, feature extractor, and config.

Example

from qwen_tts import Qwen3TTSTokenizer
import torch

# Basic loading
tokenizer = Qwen3TTSTokenizer.from_pretrained(
    "Qwen/Qwen3-TTS-Tokenizer-12Hz"
)

# With device and dtype
tokenizer = Qwen3TTSTokenizer.from_pretrained(
    "Qwen/Qwen3-TTS-Tokenizer-12Hz",
    device_map="cuda:0",
    torch_dtype=torch.bfloat16
)

# Load 25Hz tokenizer
tokenizer_25hz = Qwen3TTSTokenizer.from_pretrained(
    "Qwen/Qwen3-TTS-Tokenizer-25Hz",
    device_map="cuda:0"
)

load_audio()

Load audio from a file path or base64 string and resample to target sample rate.
def load_audio(
    self,
    x: str,
    target_sr: int,
) -> np.ndarray
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:122-158

Parameters

x
str
required
Audio source:
  • File path to WAV file
  • HTTP/HTTPS URL to audio file
  • Base64 encoded audio string (raw or data URL format)
target_sr
int
required
Target sampling rate in Hz for resampling.

Returns

waveform
np.ndarray
1-D float32 numpy array containing the resampled audio waveform at target_sr.

Example

import soundfile as sf

# Load from file path
audio = tokenizer.load_audio("audio.wav", target_sr=16000)

# Load from URL
audio_url = tokenizer.load_audio(
    "https://example.com/audio.wav",
    target_sr=16000
)

# Load from base64
base64_str = "data:audio/wav;base64,UklGRiQAAABXQVZFZm10..."
audio_b64 = tokenizer.load_audio(base64_str, target_sr=16000)

# Save loaded audio
sf.write("output.wav", audio, 16000)

Utility Methods

get_model_type()

Get the underlying tokenizer model type.
def get_model_type(self) -> str
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:367-375
Returns: "qwen3_tts_tokenizer_25hz" or "qwen3_tts_tokenizer_12hz"

get_input_sample_rate()

Get the expected input sample rate for encoding.
def get_input_sample_rate(self) -> int
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:377-384
Returns: Input sample rate in Hz

get_output_sample_rate()

Get the output sample rate for decoded waveforms.
def get_output_sample_rate(self) -> int
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:386-393
Returns: Output sample rate in Hz

get_encode_downsample_rate()

Get the encoder downsample rate (waveform samples per code step).
def get_encode_downsample_rate(self) -> int
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:395-402
Returns: Encode downsample rate

get_decode_upsample_rate()

Get the decoder upsample rate (waveform samples per code step).
def get_decode_upsample_rate(self) -> int
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:404-411
Returns: Decode upsample rate

AudioInput Type

The tokenizer accepts the following input types (defined in qwen_tts/inference/qwen3_tts_tokenizer.py:36-41):
AudioInput = Union[
    str,              # WAV path, URL, or base64 string
    np.ndarray,       # 1-D float array
    List[str],        # List of paths/URLs/base64 strings
    List[np.ndarray], # List of 1-D float arrays
]

Build docs developers (and LLMs) love