Hardware Acceleration Overview

ONNX Runtime GenAI supports multiple hardware acceleration providers to optimize model inference across different platforms. Each execution provider targets specific hardware and offers unique performance characteristics.

Available Execution Providers

ONNX Runtime GenAI supports the following execution providers:

CUDA

NVIDIA GPU acceleration with comprehensive memory management

DirectML

Cross-vendor GPU acceleration on Windows platforms

OpenVINO

Intel hardware optimization for CPU, GPU, and NPU

QNN

Qualcomm NPU acceleration for edge and mobile devices

WebGPU

Browser-based GPU acceleration using WebGPU API

Platform Compatibility Matrix

Provider	Windows	Linux	macOS	Android	Browser
CUDA	✅	✅	❌	❌	❌
DirectML	✅	❌	❌	❌	❌
OpenVINO	✅	✅	✅	❌	❌
QNN	✅	✅	❌	✅	❌
WebGPU	✅	✅	✅	❌	✅
CPU	✅	✅	✅	✅	✅

Hardware Type Support

Provider	CPU	GPU	NPU	Target Hardware
CUDA	❌	✅	❌	NVIDIA GPUs
DirectML	❌	✅	❌	All DirectX 12 GPUs
OpenVINO	✅	✅	✅	Intel CPUs, iGPUs, NPUs
QNN	❌	❌	✅	Qualcomm Hexagon NPUs
WebGPU	❌	✅	❌	Browser-supported GPUs

Performance Considerations

Memory Management

Each provider handles memory differently:

CUDA: Device memory with host-pinned allocations for efficient transfers
DirectML: D3D12 resource management with upload/readback heaps
OpenVINO: CPU-accessible memory with optional device acceleration
QNN: CPU-accessible NPU memory
WebGPU: GPU buffers with async CPU-GPU synchronization

Precision Support

FP32
FP16
INT8

All providers support full precision (FP32) inference.

Provider Selection Guide

Choose the right provider based on your deployment scenario:

Server Deployment

import onnxruntime_genai as og

# NVIDIA GPU server
config = og.Config(model_path)
config.clear_providers()
config.append_provider("cuda")
model = og.Model(config)

Windows Desktop

import onnxruntime_genai as og

# Cross-vendor GPU support
config = og.Config(model_path)
config.clear_providers()
config.append_provider("dml")
model = og.Model(config)

Edge Devices

import onnxruntime_genai as og

# Intel edge hardware
config = og.Config(model_path)
config.clear_providers()
config.append_provider("openvino")
config.set_provider_option("openvino", "device_type", "CPU")
model = og.Model(config)

Mobile Deployment

import onnxruntime_genai as og

# Qualcomm Snapdragon devices
config = og.Config(model_path)
config.clear_providers()
config.append_provider("qnn")
model = og.Model(config)

Configuration in genai_config.json

Providers can be configured directly in your model’s genai_config.json:

{
  "model": {
    "decoder": {
      "session_options": {
        "provider_options": [
          {
            "cuda": {}
          }
        ]
      }
    }
  }
}

The provider_options array specifies execution providers in priority order. ONNX Runtime will use the first available provider.

Device Filtering

For multi-device systems, you can filter by hardware type:

{
  "provider_options": [
    {
      "openvino": {
        "device_type": "GPU"
      },
      "device_filtering_options": {
        "hardware_device_type": "gpu",
        "hardware_device_id": 0
      }
    }
  ]
}

Provider availability depends on your installation. Install provider-specific packages:

CUDA: onnxruntime-genai-cuda
DirectML: onnxruntime-genai-directml
Other providers may require building from source.

Next Steps

CUDA Setup

Configure NVIDIA GPU acceleration

DirectML Setup

Enable DirectML on Windows

OpenVINO Setup

Optimize for Intel hardware

QNN Setup

Deploy to Qualcomm devices

Get Started

Core Concepts

Guides

Multi-Modal

Hardware Acceleration

Available Execution Providers

CUDA

DirectML

OpenVINO

QNN

WebGPU

Platform Compatibility Matrix

Hardware Type Support

Performance Considerations

Memory Management

Precision Support

Provider Selection Guide

Server Deployment

Windows Desktop

Edge Devices

Mobile Deployment

Configuration in genai_config.json

Device Filtering

Next Steps

CUDA Setup

DirectML Setup

OpenVINO Setup

QNN Setup

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Multi-Modal

Hardware Acceleration

​Available Execution Providers

CUDA

DirectML

OpenVINO

QNN

WebGPU

​Platform Compatibility Matrix

​Hardware Type Support

​Performance Considerations

​Memory Management

​Precision Support

​Provider Selection Guide

​Server Deployment

​Windows Desktop

​Edge Devices

​Mobile Deployment

​Configuration in genai_config.json

​Device Filtering

​Next Steps

CUDA Setup

DirectML Setup

OpenVINO Setup

QNN Setup

Build docs developers (and LLMs) love

Available Execution Providers

Platform Compatibility Matrix

Hardware Type Support

Performance Considerations

Memory Management

Precision Support

Provider Selection Guide

Server Deployment

Windows Desktop

Edge Devices

Mobile Deployment

Configuration in genai_config.json

Device Filtering

Next Steps