Execution Backends

Available Backends

Retto uses ONNX Runtime under the hood, supporting multiple execution providers:

CPU

Universal fallback, works everywhere

CUDA

NVIDIA GPU acceleration

DirectML

Windows GPU acceleration

WebAssembly

Browser-based execution

CPU Backend

The CPU backend is always available and requires no special configuration:

Cargo.toml

[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort"] }

use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

The CPU backend uses optimized SIMD instructions on x86_64 and ARM architectures.

CUDA Backend

Accelerate inference with NVIDIA GPUs using CUDA.

Requirements

NVIDIA GPU with CUDA support
CUDA Toolkit 11.x or 12.x
cuDNN library

Enable Feature

Cargo.toml

[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-cuda"] }

For the CLI:

cargo build --release --features backend-ort-cuda

Configure Device

use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::Cuda(0),  // Use GPU 0
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

let mut session = RettoSession::new(cfg)?;

Multi-GPU Selection

Specify which GPU to use:

// Use the first GPU
device: RettoOrtWorkerDevice::Cuda(0)

// Use the second GPU
device: RettoOrtWorkerDevice::Cuda(1)

CUDA Configuration

From ort_worker.rs:154, the CUDA execution provider is configured with:

RettoOrtWorkerDevice::Cuda(id) => providers.push(
    CUDAExecutionProvider::default()
        .with_arena_extend_strategy(NextPowerOfTwo)
        .with_conv_algorithm_search(Exhaustive)
        .with_device_id(id)
        .build(),
)

Arena Strategy
Conv Algorithm

Memory allocation uses NextPowerOfTwo strategy for efficient GPU memory management.

DirectML Backend

DirectML provides GPU acceleration on Windows for AMD, Intel, and NVIDIA GPUs.

Requirements

Windows 10 version 1903 or later
DirectX 12 capable GPU

Enable Feature

Cargo.toml

[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-directml"] }

Configure Device

use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::DirectML(0),
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

DirectML Configuration

From ort_worker.rs:162, DirectML setup:

RettoOrtWorkerDevice::DirectML(id) => providers.push(
    DirectMLExecutionProvider::default()
        .with_device_id(id)
        .build(),
)

DirectML is particularly useful for AMD GPUs on Windows, which don’t support CUDA.

WebAssembly Backend

Run OCR in web browsers using WebAssembly.

Enable Feature

Cargo.toml

[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-wasm"] }

Initialization

From ort_worker.rs:144, WASM requires explicit initialization:

#[cfg(target_family = "wasm")]
{
    ort::init()
        .with_global_thread_pool(ort::environment::GlobalThreadPoolOptions::default())
        .commit()
        .expect("Cannot initialize ort.");
}

WebAssembly execution is significantly slower than native backends. Consider server-side processing for production use.

Backend Selection Logic

From ort_worker.rs:152, Retto uses a fallback chain:

let mut providers = Vec::new();

// Try GPU backends first
match cfg.device {
    RettoOrtWorkerDevice::Cuda(id) => {
        providers.push(CUDAExecutionProvider::build());
    }
    RettoOrtWorkerDevice::DirectML(id) => {
        providers.push(DirectMLExecutionProvider::build());
    }
    _ => {}
};

// Always add CPU as fallback
providers.push(CPUExecutionProvider::default().build());

If the GPU backend fails to initialize, execution automatically falls back to CPU.

Performance Comparison

CPU
CUDA
DirectML
WebAssembly

Performance: Baseline (1x)Pros:

Universal compatibility
No extra dependencies
Stable and reliable

Cons:

Slower inference
Limited parallelism

Use Cases:

Development
Low-volume processing
Deployment without GPU access

CLI Backend Selection

The CLI supports runtime backend selection:

retto-cli --device cpu --images ./photos/

From main.rs:46, the CLI parses device configuration:

let device = match cli.device {
    DeviceKind::Cpu => RettoOrtWorkerDevice::CPU,
    #[cfg(feature = "backend-ort-cuda")]
    DeviceKind::Cuda => RettoOrtWorkerDevice::Cuda(cli.device_id),
    #[cfg(feature = "backend-ort-directml")]
    DeviceKind::DirectMl => RettoOrtWorkerDevice::DirectML(cli.device_id),
};

Choosing the Right Backend

Start with CPU

Begin development with the CPU backend for maximum compatibility.

Profile your workload

Measure throughput and latency requirements.

Enable GPU if needed

CUDA for NVIDIA GPUs and maximum performance
DirectML for Windows with any GPU
WASM for browser-based processing

Optimize configuration

Tune device IDs, batch sizes, and threading for your specific hardware.

Troubleshooting

CUDA Initialization Fails

Error: CUDA execution provider is not available

Solutions:

Verify CUDA Toolkit installation: nvcc --version
Check GPU availability: nvidia-smi
Ensure cuDNN is installed
Rebuild with --features backend-ort-cuda

DirectML Not Available

Error: DirectML execution provider is not available

Solutions:

Update Windows to version 1903+
Verify GPU drivers are up to date
Check DirectX 12 support

WASM Out of Memory

Error: Failed to allocate memory

Solutions:

Reduce image size before processing
Use lower resolution models
Increase browser memory limits

Environment Variables

Control ONNX Runtime behavior:

# Enable debug logging
export ORT_LOG_LEVEL=VERBOSE

# Set thread count
export ORT_NUM_THREADS=4

# Disable telemetry
export ORT_DISABLE_TELEMETRY=1

Get Started

Core Concepts

Guides

Examples

Execution Backends

Available Backends

CPU

CUDA

DirectML

WebAssembly

CPU Backend

CUDA Backend

Requirements

Enable Feature

Configure Device

Multi-GPU Selection

CUDA Configuration

DirectML Backend

Requirements

Enable Feature

Configure Device

DirectML Configuration

WebAssembly Backend

Enable Feature

Initialization

Backend Selection Logic

Performance Comparison

CLI Backend Selection

Choosing the Right Backend

Troubleshooting

CUDA Initialization Fails

DirectML Not Available

WASM Out of Memory

Environment Variables

Next Steps

Model Loading

Rust Usage

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

​Available Backends

CPU

CUDA

DirectML

WebAssembly

​CPU Backend

​CUDA Backend

​Requirements

​Enable Feature

​Configure Device

​Multi-GPU Selection

​CUDA Configuration

​DirectML Backend

​Requirements

​Enable Feature

​Configure Device

​DirectML Configuration

​WebAssembly Backend

​Enable Feature

​Initialization

​Backend Selection Logic

​Performance Comparison

​CLI Backend Selection

​Choosing the Right Backend

​Troubleshooting

​CUDA Initialization Fails

​DirectML Not Available

​WASM Out of Memory

​Environment Variables

​Next Steps

Model Loading

Rust Usage

Build docs developers (and LLMs) love

Available Backends

CPU Backend

CUDA Backend

Requirements

Enable Feature

Configure Device

Multi-GPU Selection

CUDA Configuration

DirectML Backend

Requirements

Enable Feature

Configure Device

DirectML Configuration

WebAssembly Backend

Enable Feature

Initialization

Backend Selection Logic

Performance Comparison

CLI Backend Selection

Choosing the Right Backend

Troubleshooting

CUDA Initialization Fails

DirectML Not Available

WASM Out of Memory

Environment Variables

Next Steps