Execution Providers Overview
Execution Providers (EPs) are the interface between ONNX Runtime and hardware acceleration libraries. They enable ONNX Runtime to execute models on different hardware platforms with optimal performance.What are Execution Providers?
Execution Providers abstract the details of hardware-specific acceleration, allowing ONNX Runtime to leverage:- GPUs via CUDA, TensorRT, DirectML, and ROCm
- Specialized hardware like Intel OpenVINO, Qualcomm QNN, and Apple Neural Engine
- Web platforms via WebGPU and WebAssembly
- CPU optimizations through oneDNN and XNNPACK
How Execution Providers Work
When you create an inference session, you specify execution providers in order of priority. ONNX Runtime will:- Attempt to assign operators to the first provider
- Fall back to subsequent providers if operators are unsupported
- Use the CPU provider as the final fallback
Available Execution Providers
GPU Acceleration
Specialized Hardware
Web Platforms
| Provider | Platform | Best For |
|---|---|---|
| WebGPU | Browsers | GPU acceleration in browsers |
| WebAssembly | Browsers | CPU inference in browsers |
CPU Optimization
| Provider | Platform | Best For |
|---|---|---|
| oneDNN | Intel CPUs | Intel CPU optimization |
| XNNPACK | Mobile/ARM | Mobile and ARM devices |
Choosing an Execution Provider
By Platform
Windows Desktop/Server- NVIDIA GPU: CUDA or TensorRT
- AMD GPU: DirectML
- Intel GPU: DirectML or OpenVINO
- CPU: OpenVINO (Intel) or CPU EP
- NVIDIA GPU: CUDA or TensorRT
- AMD GPU: ROCm
- Intel: OpenVINO
- CPU: CPU EP or oneDNN
- iOS/macOS: CoreML
- Android (Qualcomm): QNN
- Android (other): NNAPI
- GPU: WebGPU
- CPU: WebAssembly
By Use Case
Maximum Performance (Server)- NVIDIA: TensorRT with FP16/INT8
- AMD: ROCm
- Intel: OpenVINO
- DirectML (Windows)
- CPU EP (all platforms)
- CoreML (Apple devices)
- QNN (Qualcomm)
- NNAPI (Android)
- CPU EP (reference implementation)
Configuration Example
Python
C++
C#
Provider Priority and Fallback
Providers are evaluated in the order specified. If a provider cannot handle an operator:- The operator is assigned to the next provider in the list
- The session may use multiple providers for different operators
- CPU provider handles any remaining operators
Checking Available Providers
Python
C++
Performance Considerations
Memory Management
- Configure arena allocation strategies for GPU providers
- Set memory limits to prevent OOM errors
- Use memory-efficient data types (FP16, INT8) when supported
Data Transfer
- Minimize CPU-GPU data transfers
- Use I/O binding for zero-copy operations
- Keep data on device between inferences when possible
Graph Optimization
- Enable graph optimizations (on by default)
- Some providers apply additional optimizations
- TensorRT and OpenVINO build optimized engines
Next Steps
- Learn about specific providers: CUDA, TensorRT, DirectML
- Explore performance tuning
- See model optimization for preprocessing steps