encode()
Batch-encode audio into discrete codes and conditioning features.Defined in
qwen_tts/inference/qwen3_tts_tokenizer.py:208-257Parameters
Audio input in any of these formats:
- Single file path (str):
"audio.wav" - Single URL (str):
"https://example.com/audio.wav" - Single base64 (str):
"data:audio/wav;base64,..."or raw base64 - Single numpy array (np.ndarray): 1-D float32 waveform (requires
sr) - List of paths/URLs/base64 (List[str]):
["audio1.wav", "audio2.wav"] - List of numpy arrays (List[np.ndarray]):
[waveform1, waveform2](requiressr)
Original sampling rate in Hz for numpy waveform input.
If
True, returns a ModelOutput object. If False, returns raw tuple.Returns
25Hz Tokenizer Output
List of discrete code sequences. Each tensor has shape
(codes_len,).List of speaker embeddings. Each tensor has shape
(xvector_dim,).List of reference mel-spectrograms. Each tensor has shape
(mel_len, mel_dim).12Hz Tokenizer Output
List of multi-quantizer code sequences. Each tensor has shape
(codes_len, num_quantizers).Examples
Single File Path
examples/test_tokenizer_12hz.py
Batch of File Paths
examples/test_tokenizer_12hz.py
NumPy Array Input
examples/test_tokenizer_12hz.py
decode()
Decode discrete codes back to audio waveforms.Defined in
qwen_tts/inference/qwen3_tts_tokenizer.py:259-365Parameters
Encoded audio data in one of these formats:
- ModelOutput from encode() (recommended): Pass the output directly
- Single dict: For custom pipelines
- 25Hz:
{"audio_codes": codes, "xvectors": xvec, "ref_mels": mels} - 12Hz:
{"audio_codes": codes}
- 25Hz:
- List of dicts: For batch decoding
- Values can be torch tensors or numpy arrays
Returns
List of decoded waveforms. Each array is 1-D float32.
Output sample rate in Hz (from model configuration).
Examples
Direct Decode from encode()
examples/test_tokenizer_12hz.py
Decode from Dict (12Hz)
examples/test_tokenizer_12hz.py
Decode from List of Dicts (12Hz)
examples/test_tokenizer_12hz.py
Decode with NumPy Arrays (12Hz)
examples/test_tokenizer_12hz.py
25Hz vs 12Hz Decoding
Input Format Details
TheAudioInput type (defined in qwen_tts/inference/qwen3_tts_tokenizer.py:36-41) supports:
Supported String Formats
File Paths
File Paths
Local filesystem paths to audio files:
HTTP/HTTPS URLs
HTTP/HTTPS URLs
Remote audio URLs (detected by scheme and netloc):
URL detection logic in
qwen_tts/inference/qwen3_tts_tokenizer.py:109-114Base64 Strings
Base64 Strings
Base64-encoded audio with or without data URL prefix:
Base64 detection heuristics in
qwen_tts/inference/qwen3_tts_tokenizer.py:101-107NumPy Array Requirements
Multi-channel arrays are automatically converted to mono by averaging channels (qwen_tts/inference/qwen3_tts_tokenizer.py:201-202).