Skip to main content

OpenVINO Execution Provider

The OpenVINO Execution Provider enables accelerated inference on Intel CPUs, integrated GPUs, and VPUs (Vision Processing Units) using the Intel OpenVINO toolkit.

When to Use OpenVINO EP

Use the OpenVINO Execution Provider when:
  • You’re running on Intel CPUs (especially Xeon or Core processors)
  • You have Intel integrated GPUs (Iris Xe, UHD Graphics)
  • You’re using Intel discrete GPUs (Arc, Flex, Max series)
  • You have Intel VPUs or Movidius devices
  • You need optimized inference on Intel hardware
  • You want to deploy on edge devices with Intel processors

Key Features

  • Intel Hardware Optimization: Leverages Intel CPU extensions (AVX2, AVX-512, VNNI)
  • Multi-Device Support: CPU, GPU, VPU in a single framework
  • Graph Optimizations: Advanced model optimizations for Intel hardware
  • Dynamic Shapes: Efficient handling of variable input sizes
  • Precision Modes: FP32, FP16, INT8 quantization support
  • Heterogeneous Execution: Can split workload across different devices

Prerequisites

Hardware Support

CPUs:
  • Intel Core processors (6th gen and newer recommended)
  • Intel Xeon processors (Skylake and newer)
  • Supports SSE4.2, AVX2, AVX-512, VNNI instructions
GPUs:
  • Intel Integrated Graphics (HD Graphics 6xx and newer)
  • Intel Iris Xe Graphics
  • Intel Arc Graphics (A-series)
  • Intel Data Center GPU Flex/Max series
VPUs:
  • Intel Movidius Myriad X
  • Intel Vision Processing Units

Software Requirements

  • OpenVINO Runtime: 2024.0 or newer recommended
  • ONNX Runtime with OpenVINO support
  • Intel GPU drivers (for GPU execution)

Installation

Python

# Install ONNX Runtime
pip install onnxruntime

# Install OpenVINO Runtime (if not already installed)
pip install openvino

# Verify OpenVINO is available
python -c "import onnxruntime as ort; print(ort.get_available_providers())"
# Should include 'OpenVINOExecutionProvider'

Using Intel Distribution

# Intel optimized Python distribution
pip install onnxruntime-openvino

# Or build from source with OpenVINO support
# See: https://onnxruntime.ai/docs/build/eps.html#openvino

C++

Download pre-built binaries or build from source with OpenVINO support:
# Download OpenVINO
wget https://storage.openvinotoolkit.org/repositories/openvino/packages/2024.0/linux/

# Build ONNX Runtime with OpenVINO
git clone https://github.com/microsoft/onnxruntime.git
cd onnxruntime
./build.sh --config Release --use_openvino CPU_FP32 --build_shared_lib --parallel

Basic Usage

Python

import onnxruntime as ort
import numpy as np

# Create session with OpenVINO provider
session = ort.InferenceSession(
    "model.onnx",
    providers=['OpenVINOExecutionProvider', 'CPUExecutionProvider']
)

# Prepare input
input_name = session.get_inputs()[0].name
x = np.random.randn(1, 3, 224, 224).astype(np.float32)

# Run inference
results = session.run(None, {input_name: x})

C++

#include <onnxruntime_cxx_api.h>

Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "OpenVINOExample");
Ort::SessionOptions session_options;

// Add OpenVINO provider for CPU with FP32 precision
Ort::ThrowOnError(
    OrtSessionOptionsAppendExecutionProvider_OpenVINO(
        session_options, "CPU_FP32"
    )
);

Ort::Session session(env, "model.onnx", session_options);

// Run inference
auto output_tensors = session.Run(Ort::RunOptions{nullptr}, 
                                   input_names.data(), 
                                   &input_tensor, 1,
                                   output_names.data(), 1);

C#

using Microsoft.ML.OnnxRuntime;

var sessionOptions = new SessionOptions();
sessionOptions.AppendExecutionProvider_OpenVINO("CPU_FP32");

using var session = new InferenceSession("model.onnx", sessionOptions);

Configuration Options

Device Types

OpenVINO supports multiple device types with different precision modes:
import onnxruntime as ort

# CPU with FP32 precision (default)
session = ort.InferenceSession(
    "model.onnx",
    providers=[('OpenVINOExecutionProvider', {
        'device_type': 'CPU_FP32'
    })]
)

# CPU with FP16 precision (if supported)
session = ort.InferenceSession(
    "model.onnx",
    providers=[('OpenVINOExecutionProvider', {
        'device_type': 'CPU_FP16'
    })]
)

# Intel GPU with FP32 precision
session = ort.InferenceSession(
    "model.onnx",
    providers=[('OpenVINOExecutionProvider', {
        'device_type': 'GPU_FP32'
    })]
)

# Intel GPU with FP16 precision (better performance)
session = ort.InferenceSession(
    "model.onnx",
    providers=[('OpenVINOExecutionProvider', {
        'device_type': 'GPU_FP16'
    })]
)

# VPU/Myriad device
session = ort.InferenceSession(
    "model.onnx",
    providers=[('OpenVINOExecutionProvider', {
        'device_type': 'MYRIAD_FP16'
    })]
)

Available Device Types

Device TypeDescriptionTypical Use Case
CPU_FP32CPU with 32-bit floating pointGeneral purpose, development
CPU_FP16CPU with 16-bit floating pointMemory-constrained systems
GPU_FP32Intel GPU with 32-bit floatGPU acceleration, balanced
GPU_FP16Intel GPU with 16-bit floatMaximum GPU performance
MYRIAD_FP16Intel VPU/MovidiusEdge devices, low power
HETERO:GPU,CPUHeterogeneous executionFallback support
MULTI:GPU,CPUMulti-device executionLoad balancing

Advanced Configuration

import onnxruntime as ort

session = ort.InferenceSession(
    "model.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            # Device selection
            'device_type': 'GPU_FP16',
            
            # Performance hints
            'enable_vpu_fast_compile': False,
            'num_of_threads': 8,
            
            # Cache settings
            'enable_opencl_throttling': False,
            'cache_dir': '/tmp/openvino_cache',
        }
    )]
)

Device Selection

Querying Available Devices

import onnxruntime as ort
from onnxruntime.capi import _pybind_state as C

# Check available providers
available = ort.get_available_providers()
if 'OpenVINOExecutionProvider' in available:
    print("OpenVINO is available")
    
# To query specific OpenVINO devices, use OpenVINO Python API
try:
    from openvino.runtime import Core
    core = Core()
    devices = core.available_devices
    print(f"Available OpenVINO devices: {devices}")
    for device in devices:
        print(f"{device}: {core.get_property(device, 'FULL_DEVICE_NAME')}")
except ImportError:
    print("OpenVINO Python API not installed")

CPU Optimization

import onnxruntime as ort

# Optimize for Intel CPU
session = ort.InferenceSession(
    "model.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            'device_type': 'CPU_FP32',
            'num_of_threads': 0,  # Auto-detect optimal thread count
        }
    )]
)

GPU Optimization

import onnxruntime as ort

# Optimize for Intel GPU
session = ort.InferenceSession(
    "model.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            'device_type': 'GPU_FP16',  # FP16 for better performance
            'enable_opencl_throttling': False,
        }
    )]
)

Heterogeneous Execution

Split workload across multiple devices:
import onnxruntime as ort

# Try GPU first, fallback to CPU
session = ort.InferenceSession(
    "model.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            'device_type': 'HETERO:GPU,CPU'
        }
    )]
)

# Multi-device for load balancing
session = ort.InferenceSession(
    "model.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            'device_type': 'MULTI:GPU,CPU'
        }
    )]
)

Performance Optimization

Model Caching

OpenVINO compiles models on first run. Enable caching to speed up subsequent loads:
import onnxruntime as ort
import os

# Set cache directory
os.makedirs('/tmp/openvino_cache', exist_ok=True)

session = ort.InferenceSession(
    "model.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            'device_type': 'CPU_FP32',
            'cache_dir': '/tmp/openvino_cache',
        }
    )]
)

# First run: compiles and caches model
result = session.run(None, {input_name: x})

# Subsequent runs: loads from cache (much faster)

Dynamic Shapes

OpenVINO handles dynamic shapes efficiently:
import onnxruntime as ort
import numpy as np

session = ort.InferenceSession(
    "model_dynamic.onnx",
    providers=['OpenVINOExecutionProvider']
)

# Run with different input sizes
for batch_size in [1, 4, 8, 16]:
    x = np.random.randn(batch_size, 3, 224, 224).astype(np.float32)
    result = session.run(None, {input_name: x})
    print(f"Batch size {batch_size}: processed")

Quantization (INT8)

For INT8 models, OpenVINO provides automatic optimization:
import onnxruntime as ort

# Load quantized (INT8) model
session = ort.InferenceSession(
    "model_int8.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            'device_type': 'CPU_FP32',  # Will use INT8 ops if available
        }
    )]
)

Platform Support

PlatformArchitectureSupport
Linuxx64✅ Full
LinuxARM64✅ Limited
Windowsx64✅ Full
WindowsARM64⚠️ Experimental
macOSx64✅ Full
macOSARM64⚠️ Limited

Use Cases

Edge Deployment

import onnxruntime as ort

# Optimized for edge device with Intel CPU
session = ort.InferenceSession(
    "model.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            'device_type': 'CPU_FP32',
            'num_of_threads': 4,  # Limit threads on edge device
        }
    )]
)

Cloud Inference (Intel Xeon)

import onnxruntime as ort

# Maximize throughput on Xeon server
session = ort.InferenceSession(
    "model.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            'device_type': 'CPU_FP32',
            'num_of_threads': 0,  # Use all cores
        }
    )]
)

Intel Arc GPU

import onnxruntime as ort

# Leverage Intel discrete GPU
session = ort.InferenceSession(
    "model.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            'device_type': 'GPU_FP16',
        }
    )]
)

Performance Comparison

Typical performance improvements over standard CPU execution:
HardwarePrecisionSpeedupNotes
Intel Xeon (AVX-512)FP322-4xvs standard CPU EP
Intel Core i7/i9FP321.5-3xvs standard CPU EP
Intel Iris Xe GPUFP163-6xvs CPU
Intel Arc GPUFP165-10xvs CPU
Movidius VPUFP162-5xLow power

Troubleshooting

Provider Not Available

import onnxruntime as ort

print(ort.get_available_providers())
# If 'OpenVINOExecutionProvider' is missing:
# 1. Install OpenVINO: pip install openvino
# 2. Check ONNX Runtime build has OpenVINO support
# 3. Verify Intel hardware is present

GPU Not Detected

# Check Intel GPU drivers (Linux)
sudo apt-get install intel-opencl-icd

# Check available devices
python -c "from openvino.runtime import Core; print(Core().available_devices)"

Performance Issues

# Enable verbose logging
import onnxruntime as ort
ort.set_default_logger_severity(0)  # Verbose

session = ort.InferenceSession(
    "model.onnx",
    providers=['OpenVINOExecutionProvider']
)

# Check which device is being used
print(session.get_providers())

Compilation Errors

# Some models may not be fully supported
# Use heterogeneous execution as fallback
session = ort.InferenceSession(
    "model.onnx",
    providers=[(
        'OpenVINOExecutionProvider', {
            'device_type': 'HETERO:CPU,GPU'
        }
    ), 'CPUExecutionProvider']
)

Comparison with Other Providers

FeatureOpenVINOoneDNNCUDA
Intel CPUExcellentGoodN/A
Intel GPUExcellentN/AN/A
NVIDIA GPUN/AN/AExcellent
Edge DevicesExcellentLimitedLimited
Setup ComplexityModerateEasyModerate

Next Steps