InferenceSession
TheInferenceSession class is the main entry point for loading and running ONNX models. It manages model execution, execution providers, and provides methods for inference.
Constructor
Path to the ONNX model file or serialized model as bytes. File extension
.ort indicates ORT format, otherwise ONNX format is assumed.Session configuration options. See SessionOptions for details.
Execution providers in order of decreasing precedence. Can be provider names or tuples of (provider name, options dict). If not provided, all available providers are used.
Options dicts corresponding to providers. Should not be used if providers contains tuples with options.
Methods
run()
Compute predictions for the given inputs.Names of the outputs to compute. If None, all outputs are computed.
Dictionary mapping input names to input values as numpy arrays.
Run-specific options. See RunOptions.
List of output tensors as numpy arrays.
run_async()
Compute predictions asynchronously in a separate thread.Python function that accepts array of results and error string. Called by ORT thread when inference completes.
User data passed to callback function.
run_with_iobinding()
Run inference using IOBinding for GPU memory optimization.IOBinding object with inputs/outputs bound to device memory. See IOBinding.
get_inputs()
Get metadata about model inputs.List of NodeArg objects describing input names, shapes, and types.
get_outputs()
Get metadata about model outputs.List of NodeArg objects describing output names, shapes, and types.
get_providers()
Get registered execution providers for this session.List of provider names in order of precedence.
set_providers()
Register new execution providers, recreating the underlying session.get_modelmeta()
Get model metadata.ModelMetadata object with producer name, version, description, etc.
end_profiling()
End profiling session and return results file path.Path to profiling results file.
Example Usage
GPU Memory Optimization
Related APIs
- SessionOptions - Configure session behavior
- RunOptions - Configure individual runs
- IOBinding - Bind I/O to device memory
- Execution Providers - Configure hardware acceleration