Execution Providers Overview

Execution Providers (EPs) are the interface between ONNX Runtime and hardware acceleration libraries. They enable ONNX Runtime to execute models on different hardware platforms with optimal performance.

What are Execution Providers?

Execution Providers abstract the details of hardware-specific acceleration, allowing ONNX Runtime to leverage:

GPUs via CUDA, TensorRT, DirectML, and ROCm
Specialized hardware like Intel OpenVINO, Qualcomm QNN, and Apple Neural Engine
Web platforms via WebGPU and WebAssembly
CPU optimizations through oneDNN and XNNPACK

How Execution Providers Work

When you create an inference session, you specify execution providers in order of priority. ONNX Runtime will:

Attempt to assign operators to the first provider
Fall back to subsequent providers if operators are unsupported
Use the CPU provider as the final fallback

import onnxruntime as ort

# Providers are tried in order of priority
session = ort.InferenceSession(
    "model.onnx",
    providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
)

Available Execution Providers

GPU Acceleration

Provider	Platform	Best For
CUDA	NVIDIA GPUs	General GPU acceleration
TensorRT	NVIDIA GPUs	Maximum performance on NVIDIA
DirectML	Windows	Cross-vendor GPU support on Windows
ROCm	AMD GPUs	AMD GPU acceleration

Specialized Hardware

Provider	Platform	Best For
OpenVINO	Intel	Intel CPUs, GPUs, VPUs
QNN	Qualcomm	Snapdragon processors
CoreML	Apple	iOS, macOS devices

Web Platforms

Provider	Platform	Best For
WebGPU	Browsers	GPU acceleration in browsers
WebAssembly	Browsers	CPU inference in browsers

CPU Optimization

Provider	Platform	Best For
oneDNN	Intel CPUs	Intel CPU optimization
XNNPACK	Mobile/ARM	Mobile and ARM devices

Choosing an Execution Provider

By Platform

Windows Desktop/Server

NVIDIA GPU: CUDA or TensorRT
AMD GPU: DirectML
Intel GPU: DirectML or OpenVINO
CPU: OpenVINO (Intel) or CPU EP

Linux Server

NVIDIA GPU: CUDA or TensorRT
AMD GPU: ROCm
Intel: OpenVINO
CPU: CPU EP or oneDNN

Mobile Devices

iOS/macOS: CoreML
Android (Qualcomm): QNN
Android (other): NNAPI

Web/Browser

GPU: WebGPU
CPU: WebAssembly

By Use Case

Maximum Performance (Server)

NVIDIA: TensorRT with FP16/INT8
AMD: ROCm
Intel: OpenVINO

Cross-Platform Compatibility

DirectML (Windows)
CPU EP (all platforms)

Low Latency (Edge/Mobile)

CoreML (Apple devices)
QNN (Qualcomm)
NNAPI (Android)

Development/Testing

CPU EP (reference implementation)

Configuration Example

Python

import onnxruntime as ort

# Basic usage
session = ort.InferenceSession(
    "model.onnx",
    providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
)

# With provider options
session = ort.InferenceSession(
    "model.onnx",
    providers=[
        ('CUDAExecutionProvider', {
            'device_id': 0,
            'arena_extend_strategy': 'kNextPowerOfTwo',
            'gpu_mem_limit': 2 * 1024 * 1024 * 1024,
            'cudnn_conv_algo_search': 'EXHAUSTIVE',
            'do_copy_in_default_stream': True,
        }),
        'CPUExecutionProvider'
    ]
)

C++

#include <onnxruntime_cxx_api.h>

Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "test");
Ort::SessionOptions session_options;

// Add CUDA provider
OrtCUDAProviderOptions cuda_options;
cuda_options.device_id = 0;
session_options.AppendExecutionProvider_CUDA(cuda_options);

Ort::Session session(env, "model.onnx", session_options);

C#

using Microsoft.ML.OnnxRuntime;

var sessionOptions = new SessionOptions();
sessionOptions.AppendExecutionProvider_CUDA(0);

using var session = new InferenceSession("model.onnx", sessionOptions);

Provider Priority and Fallback

Providers are evaluated in the order specified. If a provider cannot handle an operator:

The operator is assigned to the next provider in the list
The session may use multiple providers for different operators
CPU provider handles any remaining operators

# TensorRT will handle compatible ops, CUDA handles others
session = ort.InferenceSession(
    "model.onnx",
    providers=[
        'TensorrtExecutionProvider',
        'CUDAExecutionProvider',
        'CPUExecutionProvider'
    ]
)

Checking Available Providers

Python

import onnxruntime as ort

# List all available providers
print(ort.get_available_providers())
# Output: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

# Check which providers are used by a session
session = ort.InferenceSession("model.onnx")
print(session.get_providers())

C++

#include <onnxruntime_cxx_api.h>

auto available_providers = Ort::GetAvailableProviders();
for (const auto& provider : available_providers) {
    std::cout << provider << std::endl;
}

Performance Considerations

Memory Management

Configure arena allocation strategies for GPU providers
Set memory limits to prevent OOM errors
Use memory-efficient data types (FP16, INT8) when supported

Data Transfer

Minimize CPU-GPU data transfers
Use I/O binding for zero-copy operations
Keep data on device between inferences when possible

Graph Optimization

Enable graph optimizations (on by default)
Some providers apply additional optimizations
TensorRT and OpenVINO build optimized engines

Next Steps

Learn about specific providers: CUDA, TensorRT, DirectML
Explore performance tuning
See model optimization for preprocessing steps

Get Started

Core Concepts

Inference

Training

Execution Providers

Performance

Model Conversion

Advanced

Execution Providers Overview

Execution Providers Overview

What are Execution Providers?

How Execution Providers Work

Available Execution Providers

GPU Acceleration

Specialized Hardware

Web Platforms

CPU Optimization

Choosing an Execution Provider

By Platform

By Use Case

Configuration Example

Python

C++

C#

Provider Priority and Fallback

Checking Available Providers

Python

C++

Performance Considerations

Memory Management

Data Transfer

Graph Optimization

Next Steps

Get Started

Core Concepts

Inference

Training

Execution Providers

Performance

Model Conversion

Advanced

​Execution Providers Overview

​What are Execution Providers?

​How Execution Providers Work

​Available Execution Providers

​GPU Acceleration

​Specialized Hardware

​Web Platforms

​CPU Optimization

​Choosing an Execution Provider

​By Platform

​By Use Case

​Configuration Example

​Python

​C++

​C#

​Provider Priority and Fallback

​Checking Available Providers

​Python

​C++

​Performance Considerations

​Memory Management

​Data Transfer

​Graph Optimization

​Next Steps

Execution Providers Overview

What are Execution Providers?

How Execution Providers Work

Available Execution Providers

GPU Acceleration

Specialized Hardware

Web Platforms

CPU Optimization

Choosing an Execution Provider

By Platform

By Use Case

Configuration Example

Python

C++

C#

Provider Priority and Fallback

Checking Available Providers

Python

C++

Performance Considerations

Memory Management

Data Transfer

Graph Optimization

Next Steps