Skip to main content

Overview

Decodes token tensors back into human-readable text. Reverses the tokenization process by converting token IDs to their corresponding text representation.

Function Signature

def decode(output_ids: torch.Tensor) -> str

Parameters

output_ids
torch.Tensor
required
Tensor of token IDs to decode. Can be 1D (single sequence) or 2D (batch of sequences). Token IDs are converted from GPU to CPU if needed.

Returns

text
str
Decoded text string. Special tokens (SOT, EOT) and padding are included in the output. The </w> BPE markers are converted to spaces.

Examples

Basic decoding

import open_clip
import torch

# Tokenize text
text = "a photo of a cat"
tokens = open_clip.tokenize(text)

# Decode back to text
decoded = open_clip.decode(tokens[0])
print(decoded)
# Output: '<start_of_text> a photo of a cat <end_of_text>'

Decode batch of tokens

# Tokenize multiple texts
texts = ["a cat", "a dog", "a bird"]
tokens = open_clip.tokenize(texts)

# Decode each sequence
for i, token_seq in enumerate(tokens):
    decoded = open_clip.decode(token_seq)
    print(f"Text {i}: {decoded}")

Decode model predictions

import torch
import open_clip

# Load model
model, _, preprocess = open_clip.create_model_and_transforms(
    'ViT-B-32', 
    pretrained='laion2b_s34b_b79k'
)

# Get token embeddings (for demonstration)
text = "a photo of a cat"
tokens = open_clip.tokenize(text)

# Decode the tokens
decoded = open_clip.decode(tokens[0])
print(f"Original: {text}")
print(f"Decoded: {decoded}")

Handle padding and special tokens

import open_clip

# Short text with padding
text = "cat"
tokens = open_clip.tokenize(text)
print(f"Token shape: {tokens.shape}")  # torch.Size([1, 77])

# Decode includes padding (as empty space after EOT)
decoded = open_clip.decode(tokens[0])
print(f"Decoded: '{decoded}'")
# Contains: <start_of_text> cat <end_of_text> followed by padding

Remove special tokens

import open_clip

text = "a photo of a cat"
tokens = open_clip.tokenize(text)
decoded = open_clip.decode(tokens[0])

# Clean up the decoded text
cleaned = decoded.replace('<start_of_text>', '').replace('<end_of_text>', '').strip()
print(f"Cleaned: {cleaned}")
# Output: 'a photo of a cat'

Decode only non-padding tokens

import open_clip
import torch

text = "hello world"
tokens = open_clip.tokenize(text)

# Find non-zero (non-padding) tokens
non_padding_mask = tokens[0] != 0
non_padding_tokens = tokens[0][non_padding_mask]

# Decode only the actual content
decoded = open_clip.decode(non_padding_tokens)
print(decoded)

Decoding Process

The decode function:
  1. Converts token IDs to BPE subword strings
  2. Joins subwords together
  3. Decodes byte representation to UTF-8 text
  4. Replaces </w> markers with spaces
  5. Handles special tokens like <start_of_text> and <end_of_text>

Token ID Reference

TokenIDDescription
<start_of_text>49406Start of sequence marker
<end_of_text>49407End of sequence marker
Padding0Zero padding

Notes

  • The function automatically moves tensors from GPU to CPU for decoding
  • Decoded text includes special tokens (<start_of_text>, <end_of_text>)
  • Padding tokens (ID: 0) decode to empty strings but may appear as spaces
  • BPE word boundaries (</w>) are converted to spaces in the output
  • This uses the module-level SimpleTokenizer instance
  • For custom tokenizers, call the .decode() method on the tokenizer instance directly

Error Handling

import open_clip
import torch

# Decode handles invalid token IDs gracefully
invalid_tokens = torch.tensor([99999, 100000, 100001])
decoded = open_clip.decode(invalid_tokens)
print(f"Decoded invalid tokens: {decoded}")
# Outputs replacement characters for unknown tokens

See Also

Build docs developers (and LLMs) love