Skip to main content
The Model class loads ONNX models optimized for text generation using ONNX Runtime.

Constructor

Create a model from a configuration path or Config object.
import onnxruntime_genai as og

# From path
model = og.Model("/path/to/model")

# From Config object
config = og.Config("/path/to/model")
model = og.Model(config)
config_path
str
Path to the model directory containing genai_config.json and model.onnx
config
Config
A Config object with custom execution provider settings

Properties

type

The model architecture type (e.g., “phi3”, “llama”, “phi3v”, “phi4mm”).
model_type = model.type
print(f"Model type: {model_type}")
type
str
Model architecture identifier from the configuration

device_type

The device type the model is running on.
device = model.device_type
print(f"Running on: {device}")
device_type
str
Device identifier (e.g., “CPU”, “CUDA”, “DML”)

Methods

create_multimodal_processor()

Create a multimodal processor for models that support images and audio.
processor = model.create_multimodal_processor()
processor
MultiModalProcessor
Processor for encoding images, audio, and text together

Example Usage

Basic text generation setup:
import onnxruntime_genai as og

# Load model
model = og.Model("/models/phi-3-mini")
print(f"Loaded {model.type} on {model.device_type}")

# Create tokenizer
tokenizer = og.Tokenizer(model)

# Create generator params
params = og.GeneratorParams(model)
params.set_search_options(max_length=200, top_p=0.9, temperature=0.7)

# Create generator
generator = og.Generator(model, params)

# Encode and generate
input_tokens = tokenizer.encode("The first 4 digits of pi are")
generator.append_tokens(input_tokens)

while not generator.is_done():
    generator.generate_next_token()

output = tokenizer.decode(generator.get_sequence(0))
print(output)
With custom execution provider:
import onnxruntime_genai as og

# Create config and set CUDA provider
config = og.Config("/models/phi-3-mini")
config.clear_providers()
config.append_provider("cuda")
config.set_provider_option("cuda", "enable_cuda_graph", "1")

# Create model with custom config
model = og.Model(config)
print(f"Model running on {model.device_type}")
Multimodal example:
import onnxruntime_genai as og

# Load vision-language model
model = og.Model("/models/phi-3-vision")

# Create processor for multimodal inputs
processor = model.create_multimodal_processor()

# Load image
images = og.Images.open("image.jpg")

# Process prompt with image
prompt = "<|image_1|>\nWhat is in this image?"
inputs = processor(prompt, images=images)

# Generate
params = og.GeneratorParams(model)
params.set_search_options(max_length=512)
generator = og.Generator(model, params)
generator.set_inputs(inputs)

while not generator.is_done():
    generator.generate_next_token()

output = processor.decode(generator.get_sequence(0))
print(output)

Build docs developers (and LLMs) love