Skip to main content

Available Backends

Retto uses ONNX Runtime under the hood, supporting multiple execution providers:

CPU

Universal fallback, works everywhere

CUDA

NVIDIA GPU acceleration

DirectML

Windows GPU acceleration

WebAssembly

Browser-based execution

CPU Backend

The CPU backend is always available and requires no special configuration:
Cargo.toml
[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort"] }
use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};
The CPU backend uses optimized SIMD instructions on x86_64 and ARM architectures.

CUDA Backend

Accelerate inference with NVIDIA GPUs using CUDA.

Requirements

  • NVIDIA GPU with CUDA support
  • CUDA Toolkit 11.x or 12.x
  • cuDNN library

Enable Feature

Cargo.toml
[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-cuda"] }
For the CLI:
cargo build --release --features backend-ort-cuda

Configure Device

use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::Cuda(0),  // Use GPU 0
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

let mut session = RettoSession::new(cfg)?;

Multi-GPU Selection

Specify which GPU to use:
// Use the first GPU
device: RettoOrtWorkerDevice::Cuda(0)

// Use the second GPU
device: RettoOrtWorkerDevice::Cuda(1)

CUDA Configuration

From ort_worker.rs:154, the CUDA execution provider is configured with:
RettoOrtWorkerDevice::Cuda(id) => providers.push(
    CUDAExecutionProvider::default()
        .with_arena_extend_strategy(NextPowerOfTwo)
        .with_conv_algorithm_search(Exhaustive)
        .with_device_id(id)
        .build(),
)
Memory allocation uses NextPowerOfTwo strategy for efficient GPU memory management.

DirectML Backend

DirectML provides GPU acceleration on Windows for AMD, Intel, and NVIDIA GPUs.

Requirements

  • Windows 10 version 1903 or later
  • DirectX 12 capable GPU

Enable Feature

Cargo.toml
[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-directml"] }

Configure Device

use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::DirectML(0),
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

DirectML Configuration

From ort_worker.rs:162, DirectML setup:
RettoOrtWorkerDevice::DirectML(id) => providers.push(
    DirectMLExecutionProvider::default()
        .with_device_id(id)
        .build(),
)
DirectML is particularly useful for AMD GPUs on Windows, which don’t support CUDA.

WebAssembly Backend

Run OCR in web browsers using WebAssembly.

Enable Feature

Cargo.toml
[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-wasm"] }

Initialization

From ort_worker.rs:144, WASM requires explicit initialization:
#[cfg(target_family = "wasm")]
{
    ort::init()
        .with_global_thread_pool(ort::environment::GlobalThreadPoolOptions::default())
        .commit()
        .expect("Cannot initialize ort.");
}
WebAssembly execution is significantly slower than native backends. Consider server-side processing for production use.

Backend Selection Logic

From ort_worker.rs:152, Retto uses a fallback chain:
let mut providers = Vec::new();

// Try GPU backends first
match cfg.device {
    RettoOrtWorkerDevice::Cuda(id) => {
        providers.push(CUDAExecutionProvider::build());
    }
    RettoOrtWorkerDevice::DirectML(id) => {
        providers.push(DirectMLExecutionProvider::build());
    }
    _ => {}
};

// Always add CPU as fallback
providers.push(CPUExecutionProvider::default().build());
If the GPU backend fails to initialize, execution automatically falls back to CPU.

Performance Comparison

Performance: Baseline (1x)Pros:
  • Universal compatibility
  • No extra dependencies
  • Stable and reliable
Cons:
  • Slower inference
  • Limited parallelism
Use Cases:
  • Development
  • Low-volume processing
  • Deployment without GPU access

CLI Backend Selection

The CLI supports runtime backend selection:
retto-cli --device cpu --images ./photos/
From main.rs:46, the CLI parses device configuration:
let device = match cli.device {
    DeviceKind::Cpu => RettoOrtWorkerDevice::CPU,
    #[cfg(feature = "backend-ort-cuda")]
    DeviceKind::Cuda => RettoOrtWorkerDevice::Cuda(cli.device_id),
    #[cfg(feature = "backend-ort-directml")]
    DeviceKind::DirectMl => RettoOrtWorkerDevice::DirectML(cli.device_id),
};

Choosing the Right Backend

1

Start with CPU

Begin development with the CPU backend for maximum compatibility.
2

Profile your workload

Measure throughput and latency requirements.
3

Enable GPU if needed

  • CUDA for NVIDIA GPUs and maximum performance
  • DirectML for Windows with any GPU
  • WASM for browser-based processing
4

Optimize configuration

Tune device IDs, batch sizes, and threading for your specific hardware.

Troubleshooting

CUDA Initialization Fails

Error: CUDA execution provider is not available
Solutions:
  • Verify CUDA Toolkit installation: nvcc --version
  • Check GPU availability: nvidia-smi
  • Ensure cuDNN is installed
  • Rebuild with --features backend-ort-cuda

DirectML Not Available

Error: DirectML execution provider is not available
Solutions:
  • Update Windows to version 1903+
  • Verify GPU drivers are up to date
  • Check DirectX 12 support

WASM Out of Memory

Error: Failed to allocate memory
Solutions:
  • Reduce image size before processing
  • Use lower resolution models
  • Increase browser memory limits

Environment Variables

Control ONNX Runtime behavior:
# Enable debug logging
export ORT_LOG_LEVEL=VERBOSE

# Set thread count
export ORT_NUM_THREADS=4

# Disable telemetry
export ORT_DISABLE_TELEMETRY=1

Next Steps

Model Loading

Optimize model loading for your backend

Rust Usage

Build applications with Retto

Build docs developers (and LLMs) love