Skip to main content

encode()

Batch-encode audio into discrete codes and conditioning features.
def encode(
    self,
    audios: AudioInput,
    sr: Optional[int] = None,
    return_dict: bool = True,
)
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:208-257

Parameters

audios
AudioInput
required
Audio input in any of these formats:
  • Single file path (str): "audio.wav"
  • Single URL (str): "https://example.com/audio.wav"
  • Single base64 (str): "data:audio/wav;base64,..." or raw base64
  • Single numpy array (np.ndarray): 1-D float32 waveform (requires sr)
  • List of paths/URLs/base64 (List[str]): ["audio1.wav", "audio2.wav"]
  • List of numpy arrays (List[np.ndarray]): [waveform1, waveform2] (requires sr)
sr
Optional[int]
default:"None"
Original sampling rate in Hz for numpy waveform input.
Required when audios is np.ndarray or List[np.ndarray].
return_dict
bool
default:"True"
If True, returns a ModelOutput object. If False, returns raw tuple.

Returns

25Hz Tokenizer Output

audio_codes
List[torch.LongTensor]
List of discrete code sequences. Each tensor has shape (codes_len,).
xvectors
List[torch.FloatTensor]
List of speaker embeddings. Each tensor has shape (xvector_dim,).
ref_mels
List[torch.FloatTensor]
List of reference mel-spectrograms. Each tensor has shape (mel_len, mel_dim).

12Hz Tokenizer Output

audio_codes
List[torch.LongTensor]
List of multi-quantizer code sequences. Each tensor has shape (codes_len, num_quantizers).

Examples

Single File Path

examples/test_tokenizer_12hz.py
from qwen_tts import Qwen3TTSTokenizer
import soundfile as sf

audio_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/tokenizer_demo_1.wav"

tokenizer = Qwen3TTSTokenizer.from_pretrained(
    "Qwen/Qwen3-TTS-Tokenizer-12Hz",
    device_map="cuda:0"
)

# Encode single audio
enc = tokenizer.encode(audio_url)
wavs, out_sr = tokenizer.decode(enc)
sf.write("decoded_single.wav", wavs[0], out_sr)

Batch of File Paths

examples/test_tokenizer_12hz.py
audio_1 = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/tokenizer_demo_1.wav"
audio_2 = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/tokenizer_demo_2.wav"

# Encode batch
enc_batch = tokenizer.encode([audio_1, audio_2])
wavs_batch, out_sr = tokenizer.decode(enc_batch)

for i, wav in enumerate(wavs_batch):
    sf.write(f"decoded_batch_{i}.wav", wav, out_sr)

NumPy Array Input

examples/test_tokenizer_12hz.py
import requests
import io
import soundfile as sf

# Load audio as numpy array
data = requests.get(audio_url, timeout=30).content
y, sr = sf.read(io.BytesIO(data))

# Encode numpy array (must pass sr)
enc_numpy = tokenizer.encode(y, sr=sr)
wavs_numpy, out_sr = tokenizer.decode(enc_numpy)
sf.write("decoded_numpy.wav", wavs_numpy[0], out_sr)
The tokenizer automatically resamples all inputs to the model’s expected sample rate.

decode()

Decode discrete codes back to audio waveforms.
def decode(
    self,
    encoded,
) -> Tuple[List[np.ndarray], int]
Defined in qwen_tts/inference/qwen3_tts_tokenizer.py:259-365

Parameters

encoded
Any
required
Encoded audio data in one of these formats:
  1. ModelOutput from encode() (recommended): Pass the output directly
  2. Single dict: For custom pipelines
    • 25Hz: {"audio_codes": codes, "xvectors": xvec, "ref_mels": mels}
    • 12Hz: {"audio_codes": codes}
  3. List of dicts: For batch decoding
    • Values can be torch tensors or numpy arrays

Returns

wavs
List[np.ndarray]
List of decoded waveforms. Each array is 1-D float32.
sample_rate
int
Output sample rate in Hz (from model configuration).

Examples

Direct Decode from encode()

examples/test_tokenizer_12hz.py
# Recommended: Pass encode output directly
enc = tokenizer.encode(audio_url)
wavs, sr = tokenizer.decode(enc)

sf.write("decoded.wav", wavs[0], sr)

Decode from Dict (12Hz)

examples/test_tokenizer_12hz.py
# Encode batch first
enc_batch = tokenizer.encode([audio_1, audio_2])

# Decode single sample as dict
dict_input = {"audio_codes": enc_batch.audio_codes[0]}
wavs_dict, sr = tokenizer.decode(dict_input)
sf.write("decoded_dict.wav", wavs_dict[0], sr)

Decode from List of Dicts (12Hz)

examples/test_tokenizer_12hz.py
# Create list of dicts from batch encoding
list_dict_input = [
    {"audio_codes": c} for c in enc_batch.audio_codes
]
wavs_list, sr = tokenizer.decode(list_dict_input)

for i, wav in enumerate(wavs_list):
    sf.write(f"decoded_{i}.wav", wav, sr)

Decode with NumPy Arrays (12Hz)

examples/test_tokenizer_12hz.py
# Convert codes to numpy for serialization
list_dict_numpy = [
    {"audio_codes": c.cpu().numpy()}
    for c in enc_batch.audio_codes
]
wavs_numpy, sr = tokenizer.decode(list_dict_numpy)

for i, wav in enumerate(wavs_numpy):
    sf.write(f"decoded_numpy_{i}.wav", wav, sr)

25Hz vs 12Hz Decoding

# 25Hz requires all three components
enc_25hz = tokenizer_25hz.encode(audio)

# Option 1: Direct decode
wavs, sr = tokenizer_25hz.decode(enc_25hz)

# Option 2: Manual dict
dict_25hz = {
    "audio_codes": enc_25hz.audio_codes[0],
    "xvectors": enc_25hz.xvectors[0],
    "ref_mels": enc_25hz.ref_mels[0],
}
wavs, sr = tokenizer_25hz.decode(dict_25hz)
25Hz tokenizer: decode() requires xvectors and ref_mels in addition to audio_codes. Missing these will raise a ValueError.

Input Format Details

The AudioInput type (defined in qwen_tts/inference/qwen3_tts_tokenizer.py:36-41) supports:
AudioInput = Union[
    str,              # WAV path, URL, or base64 string
    np.ndarray,       # 1-D float array (requires sr parameter)
    List[str],        # Batch of paths/URLs/base64
    List[np.ndarray], # Batch of arrays (requires sr parameter)
]

Supported String Formats

Local filesystem paths to audio files:
tokenizer.encode("path/to/audio.wav")
tokenizer.encode(["audio1.wav", "audio2.wav"])
Remote audio URLs (detected by scheme and netloc):
tokenizer.encode("https://example.com/audio.wav")
tokenizer.encode("http://server.com/sounds/voice.wav")
URL detection logic in qwen_tts/inference/qwen3_tts_tokenizer.py:109-114
Base64-encoded audio with or without data URL prefix:
# With data URL prefix
tokenizer.encode("data:audio/wav;base64,UklGRiQAAABXQVZF...")

# Raw base64 (detected if > 256 chars and no path separators)
tokenizer.encode("UklGRiQAAABXQVZFZm10IBAAAAABAAEA...")
Base64 detection heuristics in qwen_tts/inference/qwen3_tts_tokenizer.py:101-107

NumPy Array Requirements

When using numpy arrays, you must provide the sr parameter:
# Correct
enc = tokenizer.encode(waveform, sr=22050)

# Wrong - raises ValueError
enc = tokenizer.encode(waveform)  # Missing sr!
Multi-channel arrays are automatically converted to mono by averaging channels (qwen_tts/inference/qwen3_tts_tokenizer.py:201-202).

Build docs developers (and LLMs) love