The Tokenizer class handles text encoding and decoding for language models.
Constructor
Create a tokenizer from a model.
import onnxruntime_genai as og
model = og.Model("/path/to/model")
tokenizer = og.Tokenizer(model)
The Model object to create the tokenizer from
Properties
bos_token_id
The beginning-of-sequence token ID.
bos_id = tokenizer.bos_token_id
Token ID for the start of a sequence
eos_token_ids
Array of end-of-sequence token IDs.
eos_ids = tokenizer.eos_token_ids
Array of token IDs that mark the end of generation
pad_token_id
The padding token ID.
pad_id = tokenizer.pad_token_id
Token ID used for padding sequences
Methods
encode()
Encode a string into token IDs.
tokens = tokenizer.encode("Hello, world!")
decode()
Decode token IDs back into text.
text = tokenizer.decode(tokens)
Array of int32 token IDs to decode
encode_batch()
Encode multiple strings at once.
prompts = ["First prompt", "Second prompt", "Third prompt"]
input_tokens = tokenizer.encode_batch(prompts)
List of text strings to encode
Tensor containing all encoded sequences
decode_batch()
Decode multiple token sequences at once.
strings = tokenizer.decode_batch(tokens)
Tensor containing token sequences to decode
List of decoded text strings
to_token_id()
Convert a token string to its ID.
token_id = tokenizer.to_token_id("hello")
The token string to convert
apply_chat_template()
Apply a chat template to format messages.
import json
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What color is the sky?"}
]
prompt = tokenizer.apply_chat_template(
messages=json.dumps(messages),
add_generation_prompt=True
)
JSON-serialized list of message dictionaries with “role” and “content” fields
Custom Jinja template string (uses model’s default if not provided)
JSON-serialized list of tool definitions
Whether to add tokens indicating the assistant’s turn to respond
The formatted prompt ready for encoding
create_stream()
Create a streaming tokenizer for incremental decoding.
stream = tokenizer.create_stream()
A TokenizerStream object for streaming decoding
update_options()
Update tokenizer options dynamically.
tokenizer.update_options(add_special_tokens="false", padding="max_length")
Key-value pairs of tokenizer options to update
TokenizerStream
Streaming decoder for incremental token-by-token decoding.
decode()
Decode a single token and return the corresponding text chunk.
stream = tokenizer.create_stream()
while not generator.is_done():
generator.generate_next_token()
new_token = generator.get_next_tokens()[0]
print(stream.decode(new_token), end="", flush=True)
The text fragment for this token (may be empty for partial multi-byte characters)
Example Usage
Basic encoding and decoding:
import onnxruntime_genai as og
model = og.Model("/models/phi-3-mini")
tokenizer = og.Tokenizer(model)
# Encode
text = "The first 4 digits of pi are"
tokens = tokenizer.encode(text)
print(f"Encoded {len(tokens)} tokens")
# Decode
decoded = tokenizer.decode(tokens)
print(f"Decoded: {decoded}")
Streaming generation:
import onnxruntime_genai as og
model = og.Model("/models/phi-3-mini")
tokenizer = og.Tokenizer(model)
stream = tokenizer.create_stream()
params = og.GeneratorParams(model)
params.set_search_options(max_length=200)
generator = og.Generator(model, params)
input_tokens = tokenizer.encode("Tell me a story")
generator.append_tokens(input_tokens)
print("Output: ", end="", flush=True)
while not generator.is_done():
generator.generate_next_token()
new_token = generator.get_next_tokens()[0]
print(stream.decode(new_token), end="", flush=True)
print()
Batch encoding:
import onnxruntime_genai as og
model = og.Model("/models/phi-3-mini")
tokenizer = og.Tokenizer(model)
prompts = [
"The first 4 digits of pi are",
"The square root of 2 is",
"The capital of France is"
]
# Encode batch
input_tokens = tokenizer.encode_batch(prompts)
print(f"Encoded {len(prompts)} prompts")
# Generate for all prompts
params = og.GeneratorParams(model)
params.set_search_options(batch_size=len(prompts), max_length=100)
generator = og.Generator(model, params)
generator.append_tokens(input_tokens)
while not generator.is_done():
generator.generate_next_token()
# Decode each sequence
for i in range(len(prompts)):
output = tokenizer.decode(generator.get_sequence(i))
print(f"Prompt {i}: {output}")
print()
Chat template:
import onnxruntime_genai as og
import json
model = og.Model("/models/phi-3-mini")
tokenizer = og.Tokenizer(model)
messages = [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "What color is the sky?"}
]
prompt = tokenizer.apply_chat_template(
messages=json.dumps(messages),
add_generation_prompt=True
)
print(f"Formatted prompt: {prompt}")
input_tokens = tokenizer.encode(prompt)
# Continue with generation...