DirectML Execution Provider - ONNX Runtime GenAI

The DirectML execution provider enables GPU acceleration across NVIDIA, AMD, and Intel GPUs on Windows platforms using the DirectX 12 API.

Requirements

Hardware

DirectX 12 capable GPU
Supported vendors:
- NVIDIA (GeForce, Quadro, Tesla)
- AMD (Radeon, Instinct)
- Intel (Arc, Iris)
- Qualcomm (Adreno)

Software

Windows 10 (version 1903 or later) or Windows 11
DirectX 12 runtime
Updated GPU drivers

DirectML is Windows-only and provides cross-vendor GPU support without vendor-specific SDKs.

Installation

Python
C#
C++

pip install onnxruntime-genai-directml --pre

dotnet add package Microsoft.ML.OnnxRuntimeGenAI.DirectML

Basic Configuration

Python API

import onnxruntime_genai as og

model_path = "path/to/model"

# Create config and set DirectML provider
config = og.Config(model_path)
config.clear_providers()
config.append_provider("dml")

# Load model
model = og.Model(config)
tokenizer = og.Tokenizer(model)

# Generate
params = og.GeneratorParams(model)
params.set_search_options(max_length=1024)

generator = og.Generator(model, params)

genai_config.json

{
  "model": {
    "decoder": {
      "session_options": {
        "provider_options": [
          {
            "dml": {}
          }
        ]
      }
    }
  }
}

GPU Selection

Automatic Selection

By default, DirectML selects the primary GPU. To choose a specific GPU:

import onnxruntime_genai as og

config = og.Config(model_path)
config.clear_providers()
config.append_provider("dml")

# Select GPU by device ID
config.set_provider_option("dml", "device_id", "0")

model = og.Model(config)

Device Filtering

Filter GPUs by hardware characteristics:

{
  "model": {
    "decoder": {
      "session_options": {
        "provider_options": [
          {
            "dml": {},
            "device_filtering_options": {
              "hardware_device_type": "gpu",
              "hardware_device_id": 0,
              "hardware_vendor_id": 4318
            }
          }
        ]
      }
    }
  }
}

Vendor IDs:

NVIDIA: 4318 (0x10DE)
AMD: 4098 (0x1002)
Intel: 32902 (0x8086)

Windows ML Integration

Use Windows ML for automatic device selection:

import onnxruntime_genai as og

# Use Windows ML to register providers
try:
    import winml
    winml.register_execution_providers(ort=False, ort_genai=True)
except ImportError:
    print("WinML not available, using default providers")

config = og.Config(model_path)
model = og.Model(config)

Memory Management

D3D12 Resource Management

DirectML uses D3D12 resources for GPU memory:

// C++ example of DirectML memory management
struct GpuMemory final : DeviceBuffer {
  GpuMemory(size_t size) : owned_{true} {
    size_in_bytes_ = size;
    p_device_ = static_cast<uint8_t*>(ort_allocator_->Alloc(size_in_bytes_));
    // Get D3D12 resource from allocation
    dml_api_->GetD3D12ResourceFromAllocation(
      ort_allocator_, p_device_, &gpu_resource_);
  }
  
  ComPtr<ID3D12Resource> gpu_resource_;
};

Upload and Readback Heaps

DirectML uses specialized heaps for CPU-GPU transfers:

Upload Heap: Transfers data from CPU to GPU
Readback Heap: Transfers results from GPU to CPU

# Efficient memory transfer is handled automatically
params = og.GeneratorParams(model)
params.set_search_options(
    max_length=512,
    batch_size=1
)

Configuration Options

Session Options

{
  "model": {
    "decoder": {
      "session_options": {
        "enable_mem_pattern": true,
        "enable_cpu_mem_arena": false,
        "graph_optimization_level": "ORT_ENABLE_ALL",
        "provider_options": [
          {
            "dml": {
              "performance_preference": "high_performance",
              "disable_metacommands": "false"
            }
          }
        ]
      }
    }
  }
}

Performance Preferences

High Performance
Low Power
Default

{
  "dml": {
    "performance_preference": "high_performance"
  }
}

Optimizes for maximum throughput at the cost of power consumption.

{
  "dml": {
    "performance_preference": "low_power"
  }
}

Reduces power consumption with potential performance trade-offs.

{
  "dml": {
    "performance_preference": "default"
  }
}

Balanced performance and power consumption.

Advanced Features

Command Queue Management

DirectML manages GPU command queues for efficient execution:

// DML execution context handles command queue
struct InterfaceImpl : DeviceInterface {
  void InitOrt(const OrtApi& api, Ort::Allocator& allocator) override {
    dml_execution_context_ = std::make_unique<DmlExecutionContext>(
        dml_objects_.d3d12_device.Get(),
        dml_device_.Get(),
        dml_objects_.command_queue.Get(),
        *ort_allocator_,
        dml_api_);
  }
};

Synchronization

DirectML operations are asynchronous. The provider handles synchronization internally:

import onnxruntime_genai as og

config = og.Config(model_path)
config.clear_providers()
config.append_provider("dml")

model = og.Model(config)

# Generation automatically handles GPU synchronization
generator = og.Generator(model, params)
while not generator.is_done():
    generator.generate_next_token()  # Sync handled internally

Multi-GPU Support

DirectML can be configured to use specific GPUs in multi-GPU systems:

import onnxruntime_genai as og

# List available GPUs (Windows ML)
try:
    import winml
    devices = winml.get_available_devices()
    for idx, device in enumerate(devices):
        print(f"GPU {idx}: {device['name']}")
except:
    pass

# Select specific GPU
config = og.Config(model_path)
config.clear_providers()
config.append_provider("dml")
config.set_provider_option("dml", "device_id", "1")  # Use second GPU

model = og.Model(config)

Troubleshooting

DirectML Not Available

import onnxruntime_genai as og

try:
    config = og.Config(model_path)
    config.clear_providers()
    config.append_provider("dml")
    model = og.Model(config)
    print("DirectML is available")
except Exception as e:
    print(f"DirectML error: {e}")
    print("Falling back to CPU")
    config = og.Config(model_path)
    model = og.Model(config)

Performance Issues

Update GPU Drivers

Ensure you have the latest GPU drivers installed:

NVIDIA: GeForce Experience or nvidia.com
AMD: Adrenalin Software
Intel: Intel Driver & Support Assistant

Check DirectX 12 Support

dxdiag

Verify DirectX 12 is available under Display tab.

Enable High Performance Mode

{
  "dml": {
    "performance_preference": "high_performance"
  }
}

Memory Errors

# Reduce memory usage
params.set_search_options(
    max_length=256,  # Reduce max sequence length
    batch_size=1     # Use single batch
)

Comparison with Other Providers

Feature	DirectML	CUDA	OpenVINO
Platform	Windows only	Linux/Windows	Cross-platform
GPU Support	Cross-vendor	NVIDIA only	Intel preferred
Setup	Built-in Windows	Requires CUDA SDK	Requires OpenVINO
Performance	Good	Excellent	Excellent (Intel)
Ease of Use	Very Easy	Moderate	Moderate

Best Practices

Model Selection

Use FP16 models when available for better performance on modern GPUs.

Batch Processing

Keep batch sizes small (1-4) for optimal latency on consumer GPUs.

Driver Updates

Regularly update GPU drivers for best DirectML compatibility.

Power Settings

Set Windows power plan to “High Performance” for optimal GPU utilization.

Next Steps

Windows ML Integration

Learn about Windows ML device selection

Performance Tuning

Optimize DirectML performance

Get Started

Core Concepts

Guides

Multi-Modal

Hardware Acceleration

​Requirements

​Hardware

​Software

​Installation

​Basic Configuration

​Python API

​genai_config.json

​GPU Selection

​Automatic Selection

​Device Filtering

​Windows ML Integration

​Memory Management

​D3D12 Resource Management

​Upload and Readback Heaps

​Configuration Options

​Session Options

​Performance Preferences

​Advanced Features

​Command Queue Management

​Synchronization

​Multi-GPU Support

​Troubleshooting

​DirectML Not Available

​Performance Issues

​Memory Errors

​Comparison with Other Providers

​Best Practices

Model Selection

Batch Processing

Driver Updates

Power Settings

​Next Steps

Windows ML Integration

Performance Tuning

Build docs developers (and LLMs) love

Requirements

Hardware

Software

Installation

Basic Configuration

Python API

genai_config.json

GPU Selection

Automatic Selection

Device Filtering

Windows ML Integration

Memory Management

D3D12 Resource Management

Upload and Readback Heaps

Configuration Options

Session Options

Performance Preferences

Advanced Features

Command Queue Management

Synchronization

Multi-GPU Support

Troubleshooting

DirectML Not Available

Performance Issues

Memory Errors

Comparison with Other Providers

Best Practices

Next Steps