Model

The Model class loads ONNX models optimized for text generation using ONNX Runtime.

Constructor

Create a model from a configuration path or Config object.

import onnxruntime_genai as og

# From path
model = og.Model("/path/to/model")

# From Config object
config = og.Config("/path/to/model")
model = og.Model(config)

config_path

str

Path to the model directory containing genai_config.json and model.onnx

config

Config

A Config object with custom execution provider settings

Properties

type

The model architecture type (e.g., “phi3”, “llama”, “phi3v”, “phi4mm”).

model_type = model.type
print(f"Model type: {model_type}")

type

str

Model architecture identifier from the configuration

device_type

The device type the model is running on.

device = model.device_type
print(f"Running on: {device}")

device_type

str

Device identifier (e.g., “CPU”, “CUDA”, “DML”)

Methods

create_multimodal_processor()

Create a multimodal processor for models that support images and audio.

processor = model.create_multimodal_processor()

processor

MultiModalProcessor

Processor for encoding images, audio, and text together

Example Usage

Basic text generation setup:

import onnxruntime_genai as og

# Load model
model = og.Model("/models/phi-3-mini")
print(f"Loaded {model.type} on {model.device_type}")

# Create tokenizer
tokenizer = og.Tokenizer(model)

# Create generator params
params = og.GeneratorParams(model)
params.set_search_options(max_length=200, top_p=0.9, temperature=0.7)

# Create generator
generator = og.Generator(model, params)

# Encode and generate
input_tokens = tokenizer.encode("The first 4 digits of pi are")
generator.append_tokens(input_tokens)

while not generator.is_done():
    generator.generate_next_token()

output = tokenizer.decode(generator.get_sequence(0))
print(output)

With custom execution provider:

import onnxruntime_genai as og

# Create config and set CUDA provider
config = og.Config("/models/phi-3-mini")
config.clear_providers()
config.append_provider("cuda")
config.set_provider_option("cuda", "enable_cuda_graph", "1")

# Create model with custom config
model = og.Model(config)
print(f"Model running on {model.device_type}")

Multimodal example:

import onnxruntime_genai as og

# Load vision-language model
model = og.Model("/models/phi-3-vision")

# Create processor for multimodal inputs
processor = model.create_multimodal_processor()

# Load image
images = og.Images.open("image.jpg")

# Process prompt with image
prompt = "<|image_1|>\nWhat is in this image?"
inputs = processor(prompt, images=images)

# Generate
params = og.GeneratorParams(model)
params.set_search_options(max_length=512)
generator = og.Generator(model, params)
generator.set_inputs(inputs)

while not generator.is_done():
    generator.generate_next_token()

output = processor.decode(generator.get_sequence(0))
print(output)

Python API

C++ API

C# API

C API

Constructor

Properties

type

device_type

Methods

create_multimodal_processor()

Example Usage

Build docs developers (and LLMs) love

Python API

C++ API

C# API

C API

​Constructor

​Properties

​type

​device_type

​Methods

​create_multimodal_processor()

​Example Usage

Build docs developers (and LLMs) love

Constructor

Properties

type

device_type

Methods

create_multimodal_processor()

Example Usage