Overview
ORT format models are serialized using FlatBuffers, providing:- Faster loading: Zero-copy deserialization for instant model loading
- Smaller file size: Optimized binary representation
- Runtime optimizations: Pre-applied graph optimizations are preserved
- Execution provider support: EP-specific optimizations can be saved
Converting to ORT Format
Using Python
Convert ONNX models to ORT format using the Python API:Using onnxruntime.tools.convert_onnx_models_to_ort
Conversion Options
ORT Format Versions
The ORT format has evolved across ONNX Runtime versions:Version 6 (Current)
- Support for Float8 types (E4M3FN, E5M2)
- Enhanced type system for quantization
Version 5
- Removed kernel def hashes
- Added KernelTypeStrResolver for EP support
- Enables additional execution providers in minimal builds
Version 4
- Updated kernel def hashing (not backwards compatible)
Version 3
- Added
graph_doc_stringfield support
Version 2
- Sparse initializers support
Version 1
- Initial FlatBuffers implementation
- Basic model, graph, and operator support
Backwards Compatibility
ONNX Runtime 1.14+
- Full builds: Can load older ORT format models (v1-v4), but saved optimizations are ignored
- Minimal builds: Cannot load models older than version 5
Upgrading Old Models
To upgrade an older ORT format model:Using ORT Format Models
Loading ORT Models
C++ Example
JavaScript/WebAssembly
Minimal Builds
ORT format is essential for minimal builds:Graph Optimizations
Optimization Levels
- disabled: No optimizations
- basic: Constant folding, redundant node elimination
- extended: Advanced optimizations like operator fusion
- layout: Layout transformations for hardware efficiency
- all: All available optimizations
Preserving Optimizations
File Structure
ORT format files use FlatBuffers schema with:- Model metadata: Version, producer, domain
- Graph: Nodes, initializers, inputs/outputs
- Operator kernels: Kernel type resolvers for execution providers
- Runtime optimizations: Pre-computed graph transformations
Best Practices
When to Use ORT Format
Use ORT format when:- Deploying to production environments
- Using minimal builds
- Loading time is critical
- You want to preserve runtime optimizations
- Still in development/experimentation
- Need cross-framework compatibility
- Debugging models with visualization tools
Optimization Workflow
- Develop with ONNX format
- Optimize and convert to ORT format
- Test the ORT model thoroughly
- Deploy the ORT format model
Security Considerations
Performance Benefits
Load Time Comparison
| Model Size | ONNX Load | ORT Load | Improvement |
|---|---|---|---|
| 10 MB | 45 ms | 5 ms | 9x faster |
| 100 MB | 420 ms | 35 ms | 12x faster |
| 1 GB | 4.2 s | 280 ms | 15x faster |
Memory Usage
- Zero-copy deserialization reduces memory overhead
- Immediate access to model data without parsing