DirectML Execution Provider
The DirectML Execution Provider enables GPU acceleration on Windows using DirectML, Microsoft’s hardware-accelerated DirectX 12 API for machine learning. DirectML supports any DirectX 12-capable GPU from NVIDIA, AMD, Intel, and Qualcomm.When to Use DirectML EP
Use the DirectML Execution Provider when:- You’re running on Windows 10 (1903+) or Windows 11
- You need cross-vendor GPU support (NVIDIA, AMD, Intel, Qualcomm)
- You’re developing Windows desktop applications
- You want to support a wide range of GPUs without driver-specific code
- You’re targeting Windows-on-ARM devices (Surface Pro X, etc.)
- You need NPU acceleration on compatible devices
Key Features
- Cross-Vendor: Works with NVIDIA, AMD, Intel, and Qualcomm GPUs
- Wide Hardware Support: Any DirectX 12-capable GPU
- NPU Support: Leverage Neural Processing Units on compatible hardware
- Windows Integration: Optimized for Windows platform
- Single API: No need for vendor-specific SDKs
Prerequisites
Hardware Requirements
- DirectX 12-capable GPU
- Windows 10 (version 1903 or later) or Windows 11
- Minimum 2GB GPU memory recommended
Supported GPUs
- NVIDIA: GTX 900 series and newer
- AMD: Radeon RX 400 series and newer
- Intel: HD Graphics 6xx and newer (Skylake+)
- Qualcomm: Adreno GPUs in Snapdragon processors
Software Requirements
- Windows 10 (1903+) or Windows 11
- ONNX Runtime DirectML package
- Up-to-date GPU drivers
Installation
Python
C++
Download the DirectML-enabled build from ONNX Runtime releases:C#/.NET
UWP (Universal Windows Platform)
Basic Usage
Python
C++
C#
WinRT/UWP (C#)
Configuration Options
Device Selection
Performance Preferences
Device Filtering
Advanced Configuration
C++ Advanced Options
Custom D3D12 Device
Multi-GPU Support
NPU Acceleration
On devices with Neural Processing Units:- Intel Core Ultra (Meteor Lake) with Intel AI Boost
- AMD Ryzen AI processors
- Qualcomm Snapdragon X Elite/Plus
- Some Surface devices
Performance Optimization
Memory Management
Session Options
Platform Support
| Platform | Architecture | Support |
|---|---|---|
| Windows 11 | x64 | ✅ Full |
| Windows 11 | ARM64 | ✅ Full |
| Windows 10 (1903+) | x64 | ✅ Full |
| Windows 10 (1903+) | ARM64 | ✅ Full |
| Windows Server 2019+ | x64 | ✅ Full |
| Linux | Any | ❌ No |
| macOS | Any | ❌ No |
Vendor-Specific Performance
NVIDIA GPUs
- Good performance for most models
- Consider CUDA/TensorRT for maximum performance
- DirectML useful for cross-vendor compatibility
AMD GPUs
- Excellent choice for AMD GPUs on Windows
- Often best or only option for AMD acceleration
- Good performance on RDNA architecture
Intel GPUs
- Great for Intel integrated and discrete GPUs
- Alternative to OpenVINO on Windows
- Good performance on Arc and Xe GPUs
Qualcomm (Windows on ARM)
- Primary option for GPU acceleration on ARM
- Optimized for Snapdragon processors
- Consider QNN EP for maximum Snapdragon performance
Troubleshooting
Provider Not Available
Performance Issues
Out of Memory
Comparison with Other Providers
| Feature | DirectML | CUDA | TensorRT |
|---|---|---|---|
| Vendor Support | All | NVIDIA only | NVIDIA only |
| Setup Complexity | Easy | Moderate | Complex |
| Performance | Good | Better | Best |
| Windows Integration | Excellent | Good | Good |
| ARM Support | Yes | No | No |
Next Steps
- For NVIDIA GPUs, compare with CUDA and TensorRT
- For Intel hardware, see OpenVINO
- Learn about model optimization