The Model class loads ONNX models optimized for text generation using ONNX Runtime.
Constructor
Create a model from a configuration path or Config object.
import onnxruntime_genai as og
# From path
model = og.Model("/path/to/model")
# From Config object
config = og.Config("/path/to/model")
model = og.Model(config)
Path to the model directory containing genai_config.json and model.onnx
A Config object with custom execution provider settings
Properties
type
The model architecture type (e.g., “phi3”, “llama”, “phi3v”, “phi4mm”).
model_type = model.type
print(f"Model type: {model_type}")
Model architecture identifier from the configuration
device_type
The device type the model is running on.
device = model.device_type
print(f"Running on: {device}")
Device identifier (e.g., “CPU”, “CUDA”, “DML”)
Methods
create_multimodal_processor()
Create a multimodal processor for models that support images and audio.
processor = model.create_multimodal_processor()
Processor for encoding images, audio, and text together
Example Usage
Basic text generation setup:
import onnxruntime_genai as og
# Load model
model = og.Model("/models/phi-3-mini")
print(f"Loaded {model.type} on {model.device_type}")
# Create tokenizer
tokenizer = og.Tokenizer(model)
# Create generator params
params = og.GeneratorParams(model)
params.set_search_options(max_length=200, top_p=0.9, temperature=0.7)
# Create generator
generator = og.Generator(model, params)
# Encode and generate
input_tokens = tokenizer.encode("The first 4 digits of pi are")
generator.append_tokens(input_tokens)
while not generator.is_done():
generator.generate_next_token()
output = tokenizer.decode(generator.get_sequence(0))
print(output)
With custom execution provider:
import onnxruntime_genai as og
# Create config and set CUDA provider
config = og.Config("/models/phi-3-mini")
config.clear_providers()
config.append_provider("cuda")
config.set_provider_option("cuda", "enable_cuda_graph", "1")
# Create model with custom config
model = og.Model(config)
print(f"Model running on {model.device_type}")
Multimodal example:
import onnxruntime_genai as og
# Load vision-language model
model = og.Model("/models/phi-3-vision")
# Create processor for multimodal inputs
processor = model.create_multimodal_processor()
# Load image
images = og.Images.open("image.jpg")
# Process prompt with image
prompt = "<|image_1|>\nWhat is in this image?"
inputs = processor(prompt, images=images)
# Generate
params = og.GeneratorParams(model)
params.set_search_options(max_length=512)
generator = og.Generator(model, params)
generator.set_inputs(inputs)
while not generator.is_done():
generator.generate_next_token()
output = processor.decode(generator.get_sequence(0))
print(output)