CUDA Execution Provider
The CUDA Execution Provider enables GPU acceleration for ONNX Runtime on NVIDIA GPUs using CUDA and cuDNN libraries.When to Use CUDA EP
Use the CUDA Execution Provider when:- You have NVIDIA GPUs (compute capability 6.0+)
- You need general-purpose GPU acceleration
- You want quick setup without TensorRT complexity
- You’re developing and testing before optimizing with TensorRT
- Your model has operators not supported by TensorRT
Prerequisites
Hardware Requirements
- NVIDIA GPU with compute capability 6.0 or higher
- Recommended: 4GB+ GPU memory
Software Requirements
- CUDA Toolkit: 11.8 or 12.x
- cuDNN: 8.x (matching your CUDA version)
- ONNX Runtime GPU package
Installation
Python
C++
Download the GPU build from the ONNX Runtime releases page:C#
Basic Usage
Python
C++
C#
Configuration Options
Python Provider Options
Key Configuration Parameters
device_id
Specifies which GPU to use (0, 1, 2, etc.). Use when you have multiple GPUs.gpu_mem_limit
Limits GPU memory usage. Useful to prevent OOM or allow multiple processes.cudnn_conv_algo_search
Controls how cuDNN selects convolution algorithms:- EXHAUSTIVE: Tests all algorithms, slowest first run, best performance
- HEURISTIC: Fast selection, good for development
- DEFAULT: Uses cuDNN default
enable_cuda_graph
Captures CUDA operations into a graph for better performance. Requires static input shapes.use_tf32
Uses TensorFloat-32 on NVIDIA Ampere GPUs (RTX 30/40 series, A100) for faster matrix operations with minimal accuracy impact.Performance Optimization
Memory Management
Arena Allocation StrategyI/O Binding (Zero-Copy)
Avoid CPU-GPU data transfers by binding GPU memory directly:CUDA Streams
Use custom CUDA streams for advanced control:Multi-GPU
Run different sessions on different GPUs:Platform Support
| Platform | Support | Notes |
|---|---|---|
| Linux x64 | ✅ Full | Best performance |
| Windows x64 | ✅ Full | Full feature support |
| Linux ARM64 | ✅ Full | NVIDIA Jetson |
| Windows ARM64 | ⚠️ Limited | Experimental |
| macOS | ❌ No | Use CPU EP |
Supported GPUs
Desktop GPUs
- RTX 40 Series (Ada Lovelace)
- RTX 30 Series (Ampere)
- RTX 20 Series (Turing)
- GTX 16 Series (Turing)
- GTX 10 Series (Pascal)
Data Center GPUs
- H100, A100, A40, A30, A10 (Ampere/Hopper)
- V100, T4 (Volta/Turing)
- P100, P40 (Pascal)
Embedded/Edge
- Jetson AGX Orin
- Jetson Orin Nano/NX
- Jetson Xavier NX/AGX
- Jetson Nano (limited)
Troubleshooting
Provider Not Available
Out of Memory Errors
Performance Issues
-
Enable EXHAUSTIVE conv search:
- Use I/O binding for repeated inference
-
Enable CUDA graph if input shapes are static:
-
Check GPU utilization: Use
nvidia-smito monitor GPU usage
Next Steps
- For maximum NVIDIA GPU performance, see TensorRT Execution Provider
- Learn about I/O Binding for zero-copy operations
- Explore performance tuning strategies