ONNX Runtime Documentation

Cross-platform, high-performance ML inferencing and training accelerator for deep learning models from PyTorch, TensorFlow, and more

Get Started API Reference

Quick start

Get up and running with ONNX Runtime in minutes

Install ONNX Runtime

Install the package for your platform and language. Python users can install via pip:

pip install onnxruntime

For GPU acceleration with CUDA:

pip install onnxruntime-gpu

Load your model

Create an inference session with your ONNX model:

import onnxruntime as ort

# Create inference session
session = ort.InferenceSession("model.onnx")

Run inference

Prepare your input and run inference:

import numpy as np

# Prepare input
input_name = session.get_inputs()[0].name
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)

# Run inference
outputs = session.run(None, {input_name: input_data})

Key features

Everything you need for production ML deployment

Cross-platform

Deploy on Windows, Linux, macOS, iOS, Android, and web browsers with consistent APIs

Hardware acceleration

Leverage CUDA, TensorRT, DirectML, CoreML, OpenVINO, and more for optimal performance

Multi-language support

Use Python, C/C++, C#, Java, and JavaScript with idiomatic APIs for each language

Model optimization

Automatic graph optimizations and quantization for faster inference

Training acceleration

Speed up PyTorch model training with ORTModule integration

Framework conversion

Convert models from PyTorch, TensorFlow, scikit-learn, and more

Explore by topic

Deep dive into specific areas of ONNX Runtime

Core concepts

Understand ONNX format, execution providers, and sessions

Inference

Run models for predictions across platforms and languages

Training

Accelerate model training with ORTModule

Execution providers

Hardware acceleration options for CPUs, GPUs, and specialized chips

Performance

Optimize inference speed and memory usage

Model conversion

Convert models from PyTorch, TensorFlow, and scikit-learn

API reference

Complete API documentation for all supported languages

Python API

InferenceSession, SessionOptions, quantization, and transformers

C/C++ API

OrtApi, sessions, tensors, and execution providers

C# API

InferenceSession, SessionOptions, and tensor operations

Java API

OrtSession, OrtEnvironment, and inference APIs

JavaScript API

Web and Node.js inference with WebAssembly backend

Join the community

Connect with other ONNX Runtime developers and get support

GitHub

Star the repo and contribute

Discussions

Ask questions and share ideas

Ready to accelerate your ML models?

Start deploying high-performance models across platforms with ONNX Runtime

Install ONNX Runtime

Get Started

Core Concepts

Inference

Training

Execution Providers

Performance

Model Conversion

Advanced

ONNX Runtime Documentation

Quick start

Key features

Cross-platform

Hardware acceleration

Multi-language support

Model optimization

Training acceleration

Framework conversion

Explore by topic

Core concepts

Inference

Training

Execution providers

Performance

Model conversion

API reference

Python API

C/C++ API

C# API

Java API

JavaScript API

Join the community

GitHub

Discussions

Ready to accelerate your ML models?