Skip to main content
The Open Neural Network Exchange (ONNX) format is an open standard for representing machine learning models. ONNX Runtime uses this format as its primary input for inference and training.

What is ONNX?

ONNX provides a common format for representing deep learning models, enabling interoperability between different frameworks:
  • Framework Agnostic: Export from PyTorch, TensorFlow, scikit-learn, and more
  • Standardized Operators: Well-defined operator specifications with versioning
  • Portable: Run models across different hardware and platforms
  • Extensible: Support for custom operators and domains

ONNX Format Structure

An ONNX model consists of several key components:

ModelProto

The top-level container for an ONNX model:
message ModelProto {
  int64 ir_version = 1;
  repeated OperatorSetIdProto opset_import = 8;
  string producer_name = 2;
  string producer_version = 3;
  string domain = 4;
  int64 model_version = 5;
  string doc_string = 6;
  GraphProto graph = 7;
  repeated StringStringEntryProto metadata_props = 14;
}
The ir_version field indicates the ONNX IR (Intermediate Representation) version, currently at version 9.

GraphProto

Represents the computational graph:
message GraphProto {
  repeated NodeProto node = 1;              // Computation nodes
  string name = 2;                          // Graph name
  repeated TensorProto initializer = 5;     // Constant tensors (weights)
  string doc_string = 10;                   // Documentation
  repeated ValueInfoProto input = 11;       // Graph inputs
  repeated ValueInfoProto output = 12;      // Graph outputs
  repeated ValueInfoProto value_info = 13;  // Intermediate values
}

NodeProto

Defines individual operators in the graph:
message NodeProto {
  repeated string input = 1;        // Input tensor names
  repeated string output = 2;       // Output tensor names
  string name = 3;                  // Node name
  string op_type = 4;               // Operator type (e.g., "Conv", "Relu")
  string domain = 7;                // Operator domain
  repeated AttributeProto attribute = 5;  // Operator attributes
}

Model Components

Nodes (Operators)

Nodes represent operations in the computation graph:
{
  "input": ["data", "conv_weight", "conv_bias"],
  "output": ["conv_output"],
  "name": "conv1",
  "op_type": "Conv",
  "domain": "",  # Empty string = ai.onnx domain
  "attribute": [
    {"name": "kernel_shape", "ints": [3, 3]},
    {"name": "strides", "ints": [1, 1]},
    {"name": "pads", "ints": [1, 1, 1, 1]}
  ]
}

Initializers (Constants)

Initializers store constant tensors like model weights:
  • Embedded directly in the model file
  • Can be stored externally for large models
  • Typically used for learned parameters
# Accessing initializers in ONNX Runtime
import onnx

model = onnx.load("model.onnx")
for initializer in model.graph.initializer:
    print(f"Name: {initializer.name}, Shape: {initializer.dims}")

Inputs and Outputs

Define the model’s interface:
# ValueInfoProto structure
{
  "name": "input_tensor",
  "type": {
    "tensor_type": {
      "elem_type": 1,  # FLOAT
      "shape": {
        "dim": [
          {"dim_param": "batch_size"},  # Dynamic dimension
          {"dim_value": 3},              # Static dimension
          {"dim_value": 224},
          {"dim_value": 224}
        ]
      }
    }
  }
}
ONNX supports various tensor element types:
TypeValueDescription
FLOAT132-bit floating point
UINT828-bit unsigned integer
INT838-bit signed integer
UINT16416-bit unsigned integer
INT16516-bit signed integer
INT32632-bit signed integer
INT64764-bit signed integer
STRING8String type
BOOL9Boolean type
FLOAT161016-bit floating point
DOUBLE1164-bit floating point
UINT321232-bit unsigned integer
UINT641364-bit unsigned integer
BFLOAT1616Brain floating point
FLOAT8E4M3FN178-bit floating point (E4M3)
INT4214-bit signed integer (packed)
UINT4224-bit unsigned integer (packed)

Operator Sets (OpSets)

ONNX uses versioned operator sets to ensure compatibility:
# OpSet import in model
opset_import {
  domain: ""           # ai.onnx domain
  version: 18         # OpSet version 18
}
opset_import {
  domain: "com.microsoft"  # Custom domain
  version: 1
}
ONNX Runtime supports multiple OpSet versions simultaneously. Models are compatible as long as the runtime supports the required OpSet version.

OpSet Evolution

Operator definitions evolve across versions:
  • New operators: Added in newer OpSets
  • Updated semantics: Changes to existing operators
  • Deprecated operators: Old operators may be removed
  • Attribute changes: New or modified operator attributes

ORT Format

ONNX Runtime also supports its own optimized format (ORT format):

ONNX Format

  • Standard ONNX protobuf format
  • Portable across runtimes
  • Human-readable (with tools)
  • Larger file size

ORT Format

  • Optimized for ONNX Runtime
  • Faster loading time
  • Smaller file size
  • Pre-applied optimizations

Converting to ORT Format

import onnxruntime as ort

# Create session with optimization
sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
sess_options.optimized_model_filepath = "model.ort"

# This creates the .ort file
session = ort.InferenceSession("model.onnx", sess_options)
ORT format models are version-specific. Models saved in one ONNX Runtime version may not load in different versions.

External Data

Large models can store tensors externally:

External Data Configuration

import onnx

# Save with external data
onnx.save_model(
    model,
    "model.onnx",
    save_as_external_data=True,
    all_tensors_to_one_file=True,
    location="weights.bin",
    size_threshold=1024,  # Tensors > 1KB stored externally
    convert_attribute=False
)
External data is useful for:
  • Models larger than 2GB (protobuf limit)
  • Faster git operations (diff, clone)
  • Separate weight management

Subgraphs and Control Flow

ONNX supports control flow operators with subgraphs:

If Operator

# If node with two subgraphs
{
  "op_type": "If",
  "input": ["condition"],
  "output": ["result"],
  "attribute": [
    {
      "name": "then_branch",
      "type": "GRAPH",
      "g": <GraphProto>  # Then branch subgraph
    },
    {
      "name": "else_branch",
      "type": "GRAPH",
      "g": <GraphProto>  # Else branch subgraph
    }
  ]
}

Loop Operator

Implements iterative computation:
{
  "op_type": "Loop",
  "input": ["max_trip_count", "condition", "loop_state"],
  "output": ["final_state", "scan_outputs"],
  "attribute": [
    {
      "name": "body",
      "type": "GRAPH",
      "g": <GraphProto>  # Loop body subgraph
    }
  ]
}

Model Metadata

Models can include custom metadata:
import onnx
from onnx import helper

model = onnx.load("model.onnx")

# Add metadata
model.metadata_props.append(
    helper.make_metadata_prop("author", "Your Name")
)
model.metadata_props.append(
    helper.make_metadata_prop("license", "MIT")
)
model.metadata_props.append(
    helper.make_metadata_prop("description", "Image classifier")
)

onnx.save(model, "model_with_metadata.onnx")

Inspecting ONNX Models

import onnx

model = onnx.load("model.onnx")

# Print model structure
print(f"Producer: {model.producer_name} {model.producer_version}")
print(f"IR version: {model.ir_version}")
print(f"OpSet version: {model.opset_import[0].version}")

# Print graph info
graph = model.graph
print(f"\nGraph: {graph.name}")
print(f"Inputs: {len(graph.input)}")
print(f"Outputs: {len(graph.output)}")
print(f"Nodes: {len(graph.node)}")
print(f"Initializers: {len(graph.initializer)}")

# Print input/output details
for inp in graph.input:
    print(f"\nInput: {inp.name}")
    print(f"  Type: {inp.type.tensor_type.elem_type}")
    print(f"  Shape: {[d.dim_value or d.dim_param for d in inp.type.tensor_type.shape.dim]}")

Best Practices

Define dynamic dimensions with names instead of -1:
# Good
input_tensor.type.tensor_type.shape.dim[0].dim_param = "batch_size"

# Avoid
input_tensor.type.tensor_type.shape.dim[0].dim_value = -1
Always optimize models before deployment:
  • Use graph optimizations
  • Consider quantization
  • Convert to ORT format for production
Use the model_version field to track model versions:
model.model_version = 2
Add documentation strings and metadata:
model.doc_string = "ResNet-50 image classifier trained on ImageNet"
model.graph.doc_string = "Main inference graph"

Next Steps

Execution Providers

Learn how execution providers accelerate model inference

Graph Optimizations

Understand optimization techniques for better performance

Sessions

Deep dive into InferenceSession configuration

Custom Operators

Learn how to add custom operators