ONNX Runtime Documentation
Cross-platform, high-performance ML inferencing and training accelerator for deep learning models from PyTorch, TensorFlow, and more
Quick start
Get up and running with ONNX Runtime in minutes
Key features
Everything you need for production ML deployment
Cross-platform
Deploy on Windows, Linux, macOS, iOS, Android, and web browsers with consistent APIs
Hardware acceleration
Leverage CUDA, TensorRT, DirectML, CoreML, OpenVINO, and more for optimal performance
Multi-language support
Use Python, C/C++, C#, Java, and JavaScript with idiomatic APIs for each language
Model optimization
Automatic graph optimizations and quantization for faster inference
Training acceleration
Speed up PyTorch model training with ORTModule integration
Framework conversion
Convert models from PyTorch, TensorFlow, scikit-learn, and more
Explore by topic
Deep dive into specific areas of ONNX Runtime
Core concepts
Understand ONNX format, execution providers, and sessions
Inference
Run models for predictions across platforms and languages
Training
Accelerate model training with ORTModule
Execution providers
Hardware acceleration options for CPUs, GPUs, and specialized chips
Performance
Optimize inference speed and memory usage
Model conversion
Convert models from PyTorch, TensorFlow, and scikit-learn
API reference
Complete API documentation for all supported languages
Python API
InferenceSession, SessionOptions, quantization, and transformers
C/C++ API
OrtApi, sessions, tensors, and execution providers
C# API
InferenceSession, SessionOptions, and tensor operations
Java API
OrtSession, OrtEnvironment, and inference APIs
JavaScript API
Web and Node.js inference with WebAssembly backend
Join the community
Connect with other ONNX Runtime developers and get support
Ready to accelerate your ML models?
Start deploying high-performance models across platforms with ONNX Runtime
Install ONNX Runtime