Overview
Decodes token tensors back into human-readable text. Reverses the tokenization process by converting token IDs to their corresponding text representation.
Function Signature
def decode(output_ids: torch.Tensor) -> str
Parameters
Tensor of token IDs to decode. Can be 1D (single sequence) or 2D (batch of sequences). Token IDs are converted from GPU to CPU if needed.
Returns
Decoded text string. Special tokens (SOT, EOT) and padding are included in the output. The </w> BPE markers are converted to spaces.
Examples
Basic decoding
import open_clip
import torch
# Tokenize text
text = "a photo of a cat"
tokens = open_clip.tokenize(text)
# Decode back to text
decoded = open_clip.decode(tokens[0])
print(decoded)
# Output: '<start_of_text> a photo of a cat <end_of_text>'
Decode batch of tokens
# Tokenize multiple texts
texts = ["a cat", "a dog", "a bird"]
tokens = open_clip.tokenize(texts)
# Decode each sequence
for i, token_seq in enumerate(tokens):
decoded = open_clip.decode(token_seq)
print(f"Text {i}: {decoded}")
Decode model predictions
import torch
import open_clip
# Load model
model, _, preprocess = open_clip.create_model_and_transforms(
'ViT-B-32',
pretrained='laion2b_s34b_b79k'
)
# Get token embeddings (for demonstration)
text = "a photo of a cat"
tokens = open_clip.tokenize(text)
# Decode the tokens
decoded = open_clip.decode(tokens[0])
print(f"Original: {text}")
print(f"Decoded: {decoded}")
Handle padding and special tokens
import open_clip
# Short text with padding
text = "cat"
tokens = open_clip.tokenize(text)
print(f"Token shape: {tokens.shape}") # torch.Size([1, 77])
# Decode includes padding (as empty space after EOT)
decoded = open_clip.decode(tokens[0])
print(f"Decoded: '{decoded}'")
# Contains: <start_of_text> cat <end_of_text> followed by padding
Remove special tokens
import open_clip
text = "a photo of a cat"
tokens = open_clip.tokenize(text)
decoded = open_clip.decode(tokens[0])
# Clean up the decoded text
cleaned = decoded.replace('<start_of_text>', '').replace('<end_of_text>', '').strip()
print(f"Cleaned: {cleaned}")
# Output: 'a photo of a cat'
Decode only non-padding tokens
import open_clip
import torch
text = "hello world"
tokens = open_clip.tokenize(text)
# Find non-zero (non-padding) tokens
non_padding_mask = tokens[0] != 0
non_padding_tokens = tokens[0][non_padding_mask]
# Decode only the actual content
decoded = open_clip.decode(non_padding_tokens)
print(decoded)
Decoding Process
The decode function:
- Converts token IDs to BPE subword strings
- Joins subwords together
- Decodes byte representation to UTF-8 text
- Replaces
</w> markers with spaces
- Handles special tokens like
<start_of_text> and <end_of_text>
Token ID Reference
| Token | ID | Description |
|---|
<start_of_text> | 49406 | Start of sequence marker |
<end_of_text> | 49407 | End of sequence marker |
| Padding | 0 | Zero padding |
Notes
- The function automatically moves tensors from GPU to CPU for decoding
- Decoded text includes special tokens (
<start_of_text>, <end_of_text>)
- Padding tokens (ID: 0) decode to empty strings but may appear as spaces
- BPE word boundaries (
</w>) are converted to spaces in the output
- This uses the module-level
SimpleTokenizer instance
- For custom tokenizers, call the
.decode() method on the tokenizer instance directly
Error Handling
import open_clip
import torch
# Decode handles invalid token IDs gracefully
invalid_tokens = torch.tensor([99999, 100000, 100001])
decoded = open_clip.decode(invalid_tokens)
print(f"Decoded invalid tokens: {decoded}")
# Outputs replacement characters for unknown tokens
See Also