What is ONNX?
ONNX provides a common format for representing deep learning models, enabling interoperability between different frameworks:- Framework Agnostic: Export from PyTorch, TensorFlow, scikit-learn, and more
- Standardized Operators: Well-defined operator specifications with versioning
- Portable: Run models across different hardware and platforms
- Extensible: Support for custom operators and domains
ONNX Format Structure
An ONNX model consists of several key components:ModelProto
The top-level container for an ONNX model:The
ir_version field indicates the ONNX IR (Intermediate Representation) version, currently at version 9.GraphProto
Represents the computational graph:NodeProto
Defines individual operators in the graph:Model Components
Nodes (Operators)
Nodes represent operations in the computation graph:Initializers (Constants)
Initializers store constant tensors like model weights:- Embedded directly in the model file
- Can be stored externally for large models
- Typically used for learned parameters
Inputs and Outputs
Define the model’s interface:Supported Tensor Data Types
Supported Tensor Data Types
ONNX supports various tensor element types:
| Type | Value | Description |
|---|---|---|
| FLOAT | 1 | 32-bit floating point |
| UINT8 | 2 | 8-bit unsigned integer |
| INT8 | 3 | 8-bit signed integer |
| UINT16 | 4 | 16-bit unsigned integer |
| INT16 | 5 | 16-bit signed integer |
| INT32 | 6 | 32-bit signed integer |
| INT64 | 7 | 64-bit signed integer |
| STRING | 8 | String type |
| BOOL | 9 | Boolean type |
| FLOAT16 | 10 | 16-bit floating point |
| DOUBLE | 11 | 64-bit floating point |
| UINT32 | 12 | 32-bit unsigned integer |
| UINT64 | 13 | 64-bit unsigned integer |
| BFLOAT16 | 16 | Brain floating point |
| FLOAT8E4M3FN | 17 | 8-bit floating point (E4M3) |
| INT4 | 21 | 4-bit signed integer (packed) |
| UINT4 | 22 | 4-bit unsigned integer (packed) |
Operator Sets (OpSets)
ONNX uses versioned operator sets to ensure compatibility:ONNX Runtime supports multiple OpSet versions simultaneously. Models are compatible as long as the runtime supports the required OpSet version.
OpSet Evolution
Operator definitions evolve across versions:- New operators: Added in newer OpSets
- Updated semantics: Changes to existing operators
- Deprecated operators: Old operators may be removed
- Attribute changes: New or modified operator attributes
ORT Format
ONNX Runtime also supports its own optimized format (ORT format):ONNX Format
- Standard ONNX protobuf format
- Portable across runtimes
- Human-readable (with tools)
- Larger file size
ORT Format
- Optimized for ONNX Runtime
- Faster loading time
- Smaller file size
- Pre-applied optimizations
Converting to ORT Format
External Data
Large models can store tensors externally:External Data Configuration
Subgraphs and Control Flow
ONNX supports control flow operators with subgraphs:If Operator
Loop Operator
Implements iterative computation:Model Metadata
Models can include custom metadata:Inspecting ONNX Models
- Python
- CLI (Netron)
- ONNX Runtime
Best Practices
Use Symbolic Dimensions
Use Symbolic Dimensions
Define dynamic dimensions with names instead of -1:
Optimize Before Deployment
Optimize Before Deployment
Always optimize models before deployment:
- Use graph optimizations
- Consider quantization
- Convert to ORT format for production
Version Your Models
Version Your Models
Use the
model_version field to track model versions:Document Your Model
Document Your Model
Add documentation strings and metadata:
Next Steps
Execution Providers
Learn how execution providers accelerate model inference
Graph Optimizations
Understand optimization techniques for better performance
Sessions
Deep dive into InferenceSession configuration
Custom Operators
Learn how to add custom operators