Skip to main content
The DirectML execution provider enables GPU acceleration across NVIDIA, AMD, and Intel GPUs on Windows platforms using the DirectX 12 API.

Requirements

Hardware

  • DirectX 12 capable GPU
  • Supported vendors:
    • NVIDIA (GeForce, Quadro, Tesla)
    • AMD (Radeon, Instinct)
    • Intel (Arc, Iris)
    • Qualcomm (Adreno)

Software

  • Windows 10 (version 1903 or later) or Windows 11
  • DirectX 12 runtime
  • Updated GPU drivers
DirectML is Windows-only and provides cross-vendor GPU support without vendor-specific SDKs.

Installation

pip install onnxruntime-genai-directml --pre

Basic Configuration

Python API

import onnxruntime_genai as og

model_path = "path/to/model"

# Create config and set DirectML provider
config = og.Config(model_path)
config.clear_providers()
config.append_provider("dml")

# Load model
model = og.Model(config)
tokenizer = og.Tokenizer(model)

# Generate
params = og.GeneratorParams(model)
params.set_search_options(max_length=1024)

generator = og.Generator(model, params)

genai_config.json

{
  "model": {
    "decoder": {
      "session_options": {
        "provider_options": [
          {
            "dml": {}
          }
        ]
      }
    }
  }
}

GPU Selection

Automatic Selection

By default, DirectML selects the primary GPU. To choose a specific GPU:
import onnxruntime_genai as og

config = og.Config(model_path)
config.clear_providers()
config.append_provider("dml")

# Select GPU by device ID
config.set_provider_option("dml", "device_id", "0")

model = og.Model(config)

Device Filtering

Filter GPUs by hardware characteristics:
{
  "model": {
    "decoder": {
      "session_options": {
        "provider_options": [
          {
            "dml": {},
            "device_filtering_options": {
              "hardware_device_type": "gpu",
              "hardware_device_id": 0,
              "hardware_vendor_id": 4318
            }
          }
        ]
      }
    }
  }
}
Vendor IDs:
  • NVIDIA: 4318 (0x10DE)
  • AMD: 4098 (0x1002)
  • Intel: 32902 (0x8086)

Windows ML Integration

Use Windows ML for automatic device selection:
import onnxruntime_genai as og

# Use Windows ML to register providers
try:
    import winml
    winml.register_execution_providers(ort=False, ort_genai=True)
except ImportError:
    print("WinML not available, using default providers")

config = og.Config(model_path)
model = og.Model(config)

Memory Management

D3D12 Resource Management

DirectML uses D3D12 resources for GPU memory:
// C++ example of DirectML memory management
struct GpuMemory final : DeviceBuffer {
  GpuMemory(size_t size) : owned_{true} {
    size_in_bytes_ = size;
    p_device_ = static_cast<uint8_t*>(ort_allocator_->Alloc(size_in_bytes_));
    // Get D3D12 resource from allocation
    dml_api_->GetD3D12ResourceFromAllocation(
      ort_allocator_, p_device_, &gpu_resource_);
  }
  
  ComPtr<ID3D12Resource> gpu_resource_;
};

Upload and Readback Heaps

DirectML uses specialized heaps for CPU-GPU transfers:
  • Upload Heap: Transfers data from CPU to GPU
  • Readback Heap: Transfers results from GPU to CPU
# Efficient memory transfer is handled automatically
params = og.GeneratorParams(model)
params.set_search_options(
    max_length=512,
    batch_size=1
)

Configuration Options

Session Options

{
  "model": {
    "decoder": {
      "session_options": {
        "enable_mem_pattern": true,
        "enable_cpu_mem_arena": false,
        "graph_optimization_level": "ORT_ENABLE_ALL",
        "provider_options": [
          {
            "dml": {
              "performance_preference": "high_performance",
              "disable_metacommands": "false"
            }
          }
        ]
      }
    }
  }
}

Performance Preferences

{
  "dml": {
    "performance_preference": "high_performance"
  }
}
Optimizes for maximum throughput at the cost of power consumption.

Advanced Features

Command Queue Management

DirectML manages GPU command queues for efficient execution:
// DML execution context handles command queue
struct InterfaceImpl : DeviceInterface {
  void InitOrt(const OrtApi& api, Ort::Allocator& allocator) override {
    dml_execution_context_ = std::make_unique<DmlExecutionContext>(
        dml_objects_.d3d12_device.Get(),
        dml_device_.Get(),
        dml_objects_.command_queue.Get(),
        *ort_allocator_,
        dml_api_);
  }
};

Synchronization

DirectML operations are asynchronous. The provider handles synchronization internally:
import onnxruntime_genai as og

config = og.Config(model_path)
config.clear_providers()
config.append_provider("dml")

model = og.Model(config)

# Generation automatically handles GPU synchronization
generator = og.Generator(model, params)
while not generator.is_done():
    generator.generate_next_token()  # Sync handled internally

Multi-GPU Support

DirectML can be configured to use specific GPUs in multi-GPU systems:
import onnxruntime_genai as og

# List available GPUs (Windows ML)
try:
    import winml
    devices = winml.get_available_devices()
    for idx, device in enumerate(devices):
        print(f"GPU {idx}: {device['name']}")
except:
    pass

# Select specific GPU
config = og.Config(model_path)
config.clear_providers()
config.append_provider("dml")
config.set_provider_option("dml", "device_id", "1")  # Use second GPU

model = og.Model(config)

Troubleshooting

DirectML Not Available

import onnxruntime_genai as og

try:
    config = og.Config(model_path)
    config.clear_providers()
    config.append_provider("dml")
    model = og.Model(config)
    print("DirectML is available")
except Exception as e:
    print(f"DirectML error: {e}")
    print("Falling back to CPU")
    config = og.Config(model_path)
    model = og.Model(config)

Performance Issues

Ensure you have the latest GPU drivers installed:
  • NVIDIA: GeForce Experience or nvidia.com
  • AMD: Adrenalin Software
  • Intel: Intel Driver & Support Assistant
dxdiag
Verify DirectX 12 is available under Display tab.
{
  "dml": {
    "performance_preference": "high_performance"
  }
}

Memory Errors

# Reduce memory usage
params.set_search_options(
    max_length=256,  # Reduce max sequence length
    batch_size=1     # Use single batch
)

Comparison with Other Providers

FeatureDirectMLCUDAOpenVINO
PlatformWindows onlyLinux/WindowsCross-platform
GPU SupportCross-vendorNVIDIA onlyIntel preferred
SetupBuilt-in WindowsRequires CUDA SDKRequires OpenVINO
PerformanceGoodExcellentExcellent (Intel)
Ease of UseVery EasyModerateModerate

Best Practices

Model Selection

Use FP16 models when available for better performance on modern GPUs.

Batch Processing

Keep batch sizes small (1-4) for optimal latency on consumer GPUs.

Driver Updates

Regularly update GPU drivers for best DirectML compatibility.

Power Settings

Set Windows power plan to “High Performance” for optimal GPU utilization.

Next Steps

Windows ML Integration

Learn about Windows ML device selection

Performance Tuning

Optimize DirectML performance

Build docs developers (and LLMs) love