Skip to main content
The Tokenizer class handles text encoding and decoding for language models.

Constructor

Create a tokenizer from a model.
import onnxruntime_genai as og

model = og.Model("/path/to/model")
tokenizer = og.Tokenizer(model)
model
Model
required
The Model object to create the tokenizer from

Properties

bos_token_id

The beginning-of-sequence token ID.
bos_id = tokenizer.bos_token_id
bos_token_id
int
Token ID for the start of a sequence

eos_token_ids

Array of end-of-sequence token IDs.
eos_ids = tokenizer.eos_token_ids
eos_token_ids
numpy.ndarray
Array of token IDs that mark the end of generation

pad_token_id

The padding token ID.
pad_id = tokenizer.pad_token_id
pad_token_id
int
Token ID used for padding sequences

Methods

encode()

Encode a string into token IDs.
tokens = tokenizer.encode("Hello, world!")
text
str
required
The text to encode
tokens
numpy.ndarray
Array of int32 token IDs

decode()

Decode token IDs back into text.
text = tokenizer.decode(tokens)
tokens
numpy.ndarray
required
Array of int32 token IDs to decode
text
str
The decoded text string

encode_batch()

Encode multiple strings at once.
prompts = ["First prompt", "Second prompt", "Third prompt"]
input_tokens = tokenizer.encode_batch(prompts)
strings
list[str]
required
List of text strings to encode
tokens
OgaTensor
Tensor containing all encoded sequences

decode_batch()

Decode multiple token sequences at once.
strings = tokenizer.decode_batch(tokens)
tokens
OgaTensor
required
Tensor containing token sequences to decode
strings
list[str]
List of decoded text strings

to_token_id()

Convert a token string to its ID.
token_id = tokenizer.to_token_id("hello")
token
str
required
The token string to convert
id
int
The token ID

apply_chat_template()

Apply a chat template to format messages.
import json

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What color is the sky?"}
]

prompt = tokenizer.apply_chat_template(
    messages=json.dumps(messages),
    add_generation_prompt=True
)
messages
str
required
JSON-serialized list of message dictionaries with “role” and “content” fields
template_str
str
default:"None"
Custom Jinja template string (uses model’s default if not provided)
tools
str
default:"None"
JSON-serialized list of tool definitions
add_generation_prompt
bool
default:"True"
Whether to add tokens indicating the assistant’s turn to respond
prompt
str
The formatted prompt ready for encoding

create_stream()

Create a streaming tokenizer for incremental decoding.
stream = tokenizer.create_stream()
stream
TokenizerStream
A TokenizerStream object for streaming decoding

update_options()

Update tokenizer options dynamically.
tokenizer.update_options(add_special_tokens="false", padding="max_length")
**kwargs
dict
Key-value pairs of tokenizer options to update

TokenizerStream

Streaming decoder for incremental token-by-token decoding.

decode()

Decode a single token and return the corresponding text chunk.
stream = tokenizer.create_stream()

while not generator.is_done():
    generator.generate_next_token()
    new_token = generator.get_next_tokens()[0]
    print(stream.decode(new_token), end="", flush=True)
token
int
required
The token ID to decode
text
str
The text fragment for this token (may be empty for partial multi-byte characters)

Example Usage

Basic encoding and decoding:
import onnxruntime_genai as og

model = og.Model("/models/phi-3-mini")
tokenizer = og.Tokenizer(model)

# Encode
text = "The first 4 digits of pi are"
tokens = tokenizer.encode(text)
print(f"Encoded {len(tokens)} tokens")

# Decode
decoded = tokenizer.decode(tokens)
print(f"Decoded: {decoded}")
Streaming generation:
import onnxruntime_genai as og

model = og.Model("/models/phi-3-mini")
tokenizer = og.Tokenizer(model)
stream = tokenizer.create_stream()

params = og.GeneratorParams(model)
params.set_search_options(max_length=200)

generator = og.Generator(model, params)
input_tokens = tokenizer.encode("Tell me a story")
generator.append_tokens(input_tokens)

print("Output: ", end="", flush=True)
while not generator.is_done():
    generator.generate_next_token()
    new_token = generator.get_next_tokens()[0]
    print(stream.decode(new_token), end="", flush=True)
print()
Batch encoding:
import onnxruntime_genai as og

model = og.Model("/models/phi-3-mini")
tokenizer = og.Tokenizer(model)

prompts = [
    "The first 4 digits of pi are",
    "The square root of 2 is",
    "The capital of France is"
]

# Encode batch
input_tokens = tokenizer.encode_batch(prompts)
print(f"Encoded {len(prompts)} prompts")

# Generate for all prompts
params = og.GeneratorParams(model)
params.set_search_options(batch_size=len(prompts), max_length=100)

generator = og.Generator(model, params)
generator.append_tokens(input_tokens)

while not generator.is_done():
    generator.generate_next_token()

# Decode each sequence
for i in range(len(prompts)):
    output = tokenizer.decode(generator.get_sequence(i))
    print(f"Prompt {i}: {output}")
    print()
Chat template:
import onnxruntime_genai as og
import json

model = og.Model("/models/phi-3-mini")
tokenizer = og.Tokenizer(model)

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "What color is the sky?"}
]

prompt = tokenizer.apply_chat_template(
    messages=json.dumps(messages),
    add_generation_prompt=True
)

print(f"Formatted prompt: {prompt}")

input_tokens = tokenizer.encode(prompt)
# Continue with generation...

Build docs developers (and LLMs) love