Available Execution Providers
ONNX Runtime GenAI supports the following execution providers:CUDA
NVIDIA GPU acceleration with comprehensive memory management
DirectML
Cross-vendor GPU acceleration on Windows platforms
OpenVINO
Intel hardware optimization for CPU, GPU, and NPU
QNN
Qualcomm NPU acceleration for edge and mobile devices
WebGPU
Browser-based GPU acceleration using WebGPU API
Platform Compatibility Matrix
| Provider | Windows | Linux | macOS | Android | Browser |
|---|---|---|---|---|---|
| CUDA | ✅ | ✅ | ❌ | ❌ | ❌ |
| DirectML | ✅ | ❌ | ❌ | ❌ | ❌ |
| OpenVINO | ✅ | ✅ | ✅ | ❌ | ❌ |
| QNN | ✅ | ✅ | ❌ | ✅ | ❌ |
| WebGPU | ✅ | ✅ | ✅ | ❌ | ✅ |
| CPU | ✅ | ✅ | ✅ | ✅ | ✅ |
Hardware Type Support
| Provider | CPU | GPU | NPU | Target Hardware |
|---|---|---|---|---|
| CUDA | ❌ | ✅ | ❌ | NVIDIA GPUs |
| DirectML | ❌ | ✅ | ❌ | All DirectX 12 GPUs |
| OpenVINO | ✅ | ✅ | ✅ | Intel CPUs, iGPUs, NPUs |
| QNN | ❌ | ❌ | ✅ | Qualcomm Hexagon NPUs |
| WebGPU | ❌ | ✅ | ❌ | Browser-supported GPUs |
Performance Considerations
Memory Management
Each provider handles memory differently:- CUDA: Device memory with host-pinned allocations for efficient transfers
- DirectML: D3D12 resource management with upload/readback heaps
- OpenVINO: CPU-accessible memory with optional device acceleration
- QNN: CPU-accessible NPU memory
- WebGPU: GPU buffers with async CPU-GPU synchronization
Precision Support
- FP32
- FP16
- INT8
All providers support full precision (FP32) inference.
Provider Selection Guide
Choose the right provider based on your deployment scenario:Server Deployment
Windows Desktop
Edge Devices
Mobile Deployment
Configuration in genai_config.json
Providers can be configured directly in your model’sgenai_config.json:
The
provider_options array specifies execution providers in priority order. ONNX Runtime will use the first available provider.Device Filtering
For multi-device systems, you can filter by hardware type:Next Steps
CUDA Setup
Configure NVIDIA GPU acceleration
DirectML Setup
Enable DirectML on Windows
OpenVINO Setup
Optimize for Intel hardware
QNN Setup
Deploy to Qualcomm devices