Skip to main content

ONNX Runtime Documentation

Cross-platform, high-performance ML inferencing and training accelerator for deep learning models from PyTorch, TensorFlow, and more

Quick start

Get up and running with ONNX Runtime in minutes

1

Install ONNX Runtime

Install the package for your platform and language. Python users can install via pip:
pip install onnxruntime
For GPU acceleration with CUDA:
pip install onnxruntime-gpu
2

Load your model

Create an inference session with your ONNX model:
import onnxruntime as ort

# Create inference session
session = ort.InferenceSession("model.onnx")
3

Run inference

Prepare your input and run inference:
import numpy as np

# Prepare input
input_name = session.get_inputs()[0].name
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)

# Run inference
outputs = session.run(None, {input_name: input_data})

Key features

Everything you need for production ML deployment

Cross-platform

Deploy on Windows, Linux, macOS, iOS, Android, and web browsers with consistent APIs

Hardware acceleration

Leverage CUDA, TensorRT, DirectML, CoreML, OpenVINO, and more for optimal performance

Multi-language support

Use Python, C/C++, C#, Java, and JavaScript with idiomatic APIs for each language

Model optimization

Automatic graph optimizations and quantization for faster inference

Training acceleration

Speed up PyTorch model training with ORTModule integration

Framework conversion

Convert models from PyTorch, TensorFlow, scikit-learn, and more

Explore by topic

Deep dive into specific areas of ONNX Runtime

Core concepts

Understand ONNX format, execution providers, and sessions

Inference

Run models for predictions across platforms and languages

Training

Accelerate model training with ORTModule

Execution providers

Hardware acceleration options for CPUs, GPUs, and specialized chips

Performance

Optimize inference speed and memory usage

Model conversion

Convert models from PyTorch, TensorFlow, and scikit-learn

API reference

Complete API documentation for all supported languages

Python API

InferenceSession, SessionOptions, quantization, and transformers

C/C++ API

OrtApi, sessions, tensors, and execution providers

C# API

InferenceSession, SessionOptions, and tensor operations

Java API

OrtSession, OrtEnvironment, and inference APIs

JavaScript API

Web and Node.js inference with WebAssembly backend

Join the community

Connect with other ONNX Runtime developers and get support

Ready to accelerate your ML models?

Start deploying high-performance models across platforms with ONNX Runtime

Install ONNX Runtime