The DirectML execution provider enables GPU acceleration across NVIDIA, AMD, and Intel GPUs on Windows platforms using the DirectX 12 API.
Requirements
Hardware
DirectX 12 capable GPU
Supported vendors:
NVIDIA (GeForce, Quadro, Tesla)
AMD (Radeon, Instinct)
Intel (Arc, Iris)
Qualcomm (Adreno)
Software
Windows 10 (version 1903 or later) or Windows 11
DirectX 12 runtime
Updated GPU drivers
DirectML is Windows-only and provides cross-vendor GPU support without vendor-specific SDKs.
Installation
pip install onnxruntime-genai-directml --pre
dotnet add package Microsoft.ML.OnnxRuntimeGenAI.DirectML
Download the DirectML package from the releases page .
Basic Configuration
Python API
import onnxruntime_genai as og
model_path = "path/to/model"
# Create config and set DirectML provider
config = og.Config(model_path)
config.clear_providers()
config.append_provider( "dml" )
# Load model
model = og.Model(config)
tokenizer = og.Tokenizer(model)
# Generate
params = og.GeneratorParams(model)
params.set_search_options( max_length = 1024 )
generator = og.Generator(model, params)
genai_config.json
{
"model" : {
"decoder" : {
"session_options" : {
"provider_options" : [
{
"dml" : {}
}
]
}
}
}
}
GPU Selection
Automatic Selection
By default, DirectML selects the primary GPU. To choose a specific GPU:
import onnxruntime_genai as og
config = og.Config(model_path)
config.clear_providers()
config.append_provider( "dml" )
# Select GPU by device ID
config.set_provider_option( "dml" , "device_id" , "0" )
model = og.Model(config)
Device Filtering
Filter GPUs by hardware characteristics:
{
"model" : {
"decoder" : {
"session_options" : {
"provider_options" : [
{
"dml" : {},
"device_filtering_options" : {
"hardware_device_type" : "gpu" ,
"hardware_device_id" : 0 ,
"hardware_vendor_id" : 4318
}
}
]
}
}
}
}
Vendor IDs :
NVIDIA: 4318 (0x10DE)
AMD: 4098 (0x1002)
Intel: 32902 (0x8086)
Windows ML Integration
Use Windows ML for automatic device selection:
import onnxruntime_genai as og
# Use Windows ML to register providers
try :
import winml
winml.register_execution_providers( ort = False , ort_genai = True )
except ImportError :
print ( "WinML not available, using default providers" )
config = og.Config(model_path)
model = og.Model(config)
Memory Management
D3D12 Resource Management
DirectML uses D3D12 resources for GPU memory:
// C++ example of DirectML memory management
struct GpuMemory final : DeviceBuffer {
GpuMemory ( size_t size ) : owned_ { true } {
size_in_bytes_ = size;
p_device_ = static_cast < uint8_t *> ( ort_allocator_ -> Alloc (size_in_bytes_));
// Get D3D12 resource from allocation
dml_api_ -> GetD3D12ResourceFromAllocation (
ort_allocator_, p_device_, & gpu_resource_);
}
ComPtr < ID3D12Resource > gpu_resource_;
};
Upload and Readback Heaps
DirectML uses specialized heaps for CPU-GPU transfers:
Upload Heap : Transfers data from CPU to GPU
Readback Heap : Transfers results from GPU to CPU
# Efficient memory transfer is handled automatically
params = og.GeneratorParams(model)
params.set_search_options(
max_length = 512 ,
batch_size = 1
)
Configuration Options
Session Options
{
"model" : {
"decoder" : {
"session_options" : {
"enable_mem_pattern" : true ,
"enable_cpu_mem_arena" : false ,
"graph_optimization_level" : "ORT_ENABLE_ALL" ,
"provider_options" : [
{
"dml" : {
"performance_preference" : "high_performance" ,
"disable_metacommands" : "false"
}
}
]
}
}
}
}
Advanced Features
Command Queue Management
DirectML manages GPU command queues for efficient execution:
// DML execution context handles command queue
struct InterfaceImpl : DeviceInterface {
void InitOrt ( const OrtApi & api , Ort :: Allocator & allocator ) override {
dml_execution_context_ = std :: make_unique < DmlExecutionContext >(
dml_objects_ . d3d12_device . Get (),
dml_device_ . Get (),
dml_objects_ . command_queue . Get (),
* ort_allocator_,
dml_api_);
}
};
Synchronization
DirectML operations are asynchronous. The provider handles synchronization internally:
import onnxruntime_genai as og
config = og.Config(model_path)
config.clear_providers()
config.append_provider( "dml" )
model = og.Model(config)
# Generation automatically handles GPU synchronization
generator = og.Generator(model, params)
while not generator.is_done():
generator.generate_next_token() # Sync handled internally
Multi-GPU Support
DirectML can be configured to use specific GPUs in multi-GPU systems:
import onnxruntime_genai as og
# List available GPUs (Windows ML)
try :
import winml
devices = winml.get_available_devices()
for idx, device in enumerate (devices):
print ( f "GPU { idx } : { device[ 'name' ] } " )
except :
pass
# Select specific GPU
config = og.Config(model_path)
config.clear_providers()
config.append_provider( "dml" )
config.set_provider_option( "dml" , "device_id" , "1" ) # Use second GPU
model = og.Model(config)
Troubleshooting
DirectML Not Available
import onnxruntime_genai as og
try :
config = og.Config(model_path)
config.clear_providers()
config.append_provider( "dml" )
model = og.Model(config)
print ( "DirectML is available" )
except Exception as e:
print ( f "DirectML error: { e } " )
print ( "Falling back to CPU" )
config = og.Config(model_path)
model = og.Model(config)
Ensure you have the latest GPU drivers installed:
NVIDIA: GeForce Experience or nvidia.com
AMD: Adrenalin Software
Intel: Intel Driver & Support Assistant
Verify DirectX 12 is available under Display tab.
Enable High Performance Mode
Memory Errors
# Reduce memory usage
params.set_search_options(
max_length = 256 , # Reduce max sequence length
batch_size = 1 # Use single batch
)
Comparison with Other Providers
Feature DirectML CUDA OpenVINO Platform Windows only Linux/Windows Cross-platform GPU Support Cross-vendor NVIDIA only Intel preferred Setup Built-in Windows Requires CUDA SDK Requires OpenVINO Performance Good Excellent Excellent (Intel) Ease of Use Very Easy Moderate Moderate
Best Practices
Model Selection Use FP16 models when available for better performance on modern GPUs.
Batch Processing Keep batch sizes small (1-4) for optimal latency on consumer GPUs.
Driver Updates Regularly update GPU drivers for best DirectML compatibility.
Power Settings Set Windows power plan to “High Performance” for optimal GPU utilization.
Next Steps
Windows ML Integration Learn about Windows ML device selection
Performance Tuning Optimize DirectML performance