Workers - Retto

Overview

Workers are the inference engine layer in Retto, responsible for executing ONNX models. They abstract away backend-specific implementation details, allowing the same processor code to work with different inference engines. Currently, Retto provides one worker implementation:

RettoOrtWorker - ONNX Runtime backend with CPU, CUDA, and DirectML support

Worker Architecture

The worker abstraction is defined by two traits in worker.rs:

RettoInnerWorker Trait

From worker.rs:69:

pub(crate) trait RettoInnerWorker {
    fn det(&mut self, input: Array4<f32>) -> RettoResult<Array4<f32>>;
    fn cls(&mut self, input: Array4<f32>) -> RettoResult<Array2<f32>>;
    fn rec(&mut self, input: Array4<f32>) -> RettoResult<Array3<f32>>;  
}

Defines the three inference operations:

det: Detection model (4D input → 4D output)
cls: Classification model (4D input → 2D output)
rec: Recognition model (4D input → 3D output)

RettoWorker Trait

From worker.rs:91:

pub trait RettoWorker: RettoInnerWorker {
    type RettoWorkerModelProvider: RettoWorkerModelProviderBuilder;
    type RettoWorkerConfig: Debug + Default + Clone + MaybeSerde;
    
    fn new(cfg: Self::RettoWorkerConfig) -> RettoResult<Self>
    where
        Self: Sized;
    
    fn init(&self) -> RettoResult<()>;
}

Adds configuration and initialization capabilities on top of inference operations.

Model Loading Strategies

Retto supports three model loading strategies based on platform and features:

1. HuggingFace Hub (Recommended)

From worker.rs:49:

RettoWorkerModelSource::HuggingFace {
    repo: String,
    model: String,
}

Automatically downloads models from HuggingFace Hub and caches them locally. Requirements:

hf-hub feature enabled
Not WebAssembly target
Internet connection for first run

Default models (ort_worker.rs:60):

fn from_hf_hub_v4_default() -> Self {
    let hf_repo = "pk5ls20/PaddleModel";
    Self(RettoWorkerModelProvider {
        det: RettoWorkerModelSource::HuggingFace {
            repo: hf_repo.to_string(),
            model: "retto/onnx/ch_PP-OCRv4_det_infer.onnx".to_string(),
        },
        rec: RettoWorkerModelSource::HuggingFace {
            repo: hf_repo.to_string(),
            model: "retto/onnx/ch_PP-OCRv4_rec_infer.onnx".to_string(),
        },
        cls: RettoWorkerModelSource::HuggingFace {
            repo: hf_repo.to_string(),
            model: "retto/onnx/ch_ppocr_mobile_v2.0_cls_infer.onnx".to_string(),
        },
    })
}

Models used:

Detection: PaddleOCR v4 detection model
Recognition: PaddleOCR v4 recognition model (Chinese + English)
Classification: PaddleOCR v2.0 mobile classification model

2. Local Path

From worker.rs:33:

RettoWorkerModelSource::Path(String)

Loads models from local filesystem. Requirements:

Not WebAssembly target
Models must exist at specified paths

Default paths (ort_worker.rs:79):

fn from_local_v4_path_default() -> Self {
    Self(RettoWorkerModelProvider {
        det: RettoWorkerModelSource::Path("ch_PP-OCRv4_det_infer.onnx".into()),
        rec: RettoWorkerModelSource::Path("ch_PP-OCRv4_rec_infer.onnx".into()),
        cls: RettoWorkerModelSource::Path("ch_ppocr_mobile_v2.0_cls_infer.onnx".into()),
    })
}

Path validation (worker.rs:34):

RettoWorkerModelSource::Path(path) => {
    let path = std::path::PathBuf::from(path);
    match path.exists() {
        true => Ok(RettoWorkerModelResolvedSource::Path(path)),
        false => Err(RettoError::ModelNotFoundError(
            path.into_os_string().to_string_lossy().to_string(),
        )),
    }
}

3. Embedded Blob (WebAssembly)

From worker.rs:21:

RettoWorkerModelSource::Blob(Vec<u8>)

Embeds model data directly in the binary. Requirements:

download-models feature enabled
Increases binary size significantly (~50MB for all models)

Default blob loading (ort_worker.rs:88):

#[cfg(feature = "download-models")]
fn from_local_v4_blob_default() -> Self {
    Self(RettoWorkerModelProvider {
        det: RettoWorkerModelSource::Blob(
            include_bytes!("../../models/ch_PP-OCRv4_det_infer.onnx").to_vec(),
        ),
        rec: RettoWorkerModelSource::Blob(
            include_bytes!("../../models/ch_PP-OCRv4_rec_infer.onnx").to_vec(),
        ),
        cls: RettoWorkerModelSource::Blob(
            include_bytes!("../../models/ch_ppocr_mobile_v2.0_cls_infer.onnx").to_vec(),
        ),
    })
}

Default Strategy Selection

From worker.rs:81:

fn default_provider() -> Self {
    #[cfg(all(not(target_family = "wasm"), feature = "hf-hub"))]
    return Self::from_hf_hub_v4_default();
    
    #[cfg(all(not(target_family = "wasm"), not(feature = "hf-hub")))]
    return Self::from_local_v4_path_default();
    
    #[cfg(target_family = "wasm")]
    return Self::from_local_v4_blob_default();
}

Priority:

HuggingFace Hub (if available)
Local path (native without hf-hub)
Embedded blob (WebAssembly)

RettoOrtWorker Implementation

The ONNX Runtime worker provides cross-platform inference with multiple execution providers.

Structure

From ort_worker.rs:113:

pub struct RettoOrtWorker {
    cfg: RettoOrtWorkerConfig,
    det_session: ort::session::Session,
    rec_session: ort::session::Session,
    cls_session: ort::session::Session,
}

Each processor has its own ONNX Runtime session for independent inference.

Configuration

From ort_worker.rs:52:

pub struct RettoOrtWorkerConfig {
    pub device: RettoOrtWorkerDevice,
    pub models: RettoOrtWorkerModelProvider,
}

Device Selection

From ort_worker.rs:21:

pub enum RettoOrtWorkerDevice {
    #[default]
    CPU,
    
    #[cfg(feature = "backend-ort-cuda")]
    Cuda(i32),  // Device ID
    
    #[cfg(feature = "backend-ort-directml")]
    DirectML(i32),  // Device ID
}

CPU (default):

Always available
No additional dependencies
Slower than GPU options

CUDA:

Requires backend-ort-cuda feature
NVIDIA GPUs only
Best performance on Linux/Windows

DirectML:

Requires backend-ort-directml feature
Works with any GPU on Windows 10+
Good compatibility, moderate performance

Session Creation

From ort_worker.rs:140:

fn new(cfg: Self::RettoWorkerConfig) -> RettoResult<Self> {
    #[cfg(target_family = "wasm")]
    {
        ort::init()
            .with_global_thread_pool(ort::environment::GlobalThreadPoolOptions::default())
            .commit()
            .expect("Cannot initialize ort.");
    }
    
    let mut providers = Vec::new();
    match cfg.device {
        #[cfg(feature = "backend-ort-cuda")]
        RettoOrtWorkerDevice::Cuda(id) => providers.push(
            CUDAExecutionProvider::default()
                .with_arena_extend_strategy(NextPowerOfTwo)
                .with_conv_algorithm_search(Exhaustive)
                .with_device_id(id)
                .build(),
        ),
        #[cfg(feature = "backend-ort-directml")]
        RettoOrtWorkerDevice::DirectML(id) => providers.push(
            DirectMLExecutionProvider::default()
                .with_device_id(id)
                .build(),
        ),
        _ => {}
    };
    providers.push(CPUExecutionProvider::default().build());
    
    let det_session = build_ort_session(cfg.models.det.clone(), &providers)?;
    let cls_session = build_ort_session(cfg.models.cls.clone(), &providers)?;
    let rec_session = build_ort_session(cfg.models.rec.clone(), &providers)?;
    // ...
}

Key points:

WebAssembly requires explicit ONNX Runtime initialization
CPU provider is always added as fallback
Three separate sessions are created for det/cls/rec models
All sessions use the same execution providers

CUDA Configuration

From ort_worker.rs:156:

CUDAExecutionProvider::default()
    .with_arena_extend_strategy(NextPowerOfTwo)
    .with_conv_algorithm_search(Exhaustive)
    .with_device_id(id)
    .build()

Arena extend strategy: NextPowerOfTwo allocates memory in power-of-2 chunks, reducing fragmentation. Conv algorithm search: Exhaustive finds the fastest convolution algorithm for your GPU (slower startup, faster inference).

Session Building

From ort_worker.rs:120:

fn build_ort_session(
    model_source: RettoWorkerModelSource,
    providers: &[ExecutionProviderDispatch],
) -> RettoResult<ort::session::Session> {
    let builder = ort::session::Session::builder()?
        .with_execution_providers(providers)?;
    let model_source = model_source.resolve()?;
    match model_source {
        #[cfg(not(target_family = "wasm"))]
        RettoWorkerModelResolvedSource::Path(path) => {
            builder.commit_from_file(path).map_err(RettoError::from)
        }
        RettoWorkerModelResolvedSource::Blob(blob) => {
            builder.commit_from_memory(&blob).map_err(RettoError::from)
        }
    }
}

Supports both file-based and memory-based model loading.

Inference Implementation

From ort_worker.rs:188:

Detection

fn det(&mut self, input: Array4<f32>) -> RettoResult<Array4<f32>> {
    let outputs = self.det_session.run(ort::inputs! {
        "x" => TensorRef::from_array_view(&input.as_standard_layout())?
    })?;
    let val = &outputs[0]
        .try_extract_array::<f32>()?  
        .into_dimensionality::<Ix4>()?;
    let output = val.to_owned();
    Ok(output)
}

Input: Array4<f32> with shape [1, 3, H, W] (batch, channels, height, width) Output: Array4<f32> with shape [1, 1, H, W] (probability map)

Classification

fn cls(&mut self, input: Array4<f32>) -> RettoResult<Array2<f32>> {
    let outputs = self.cls_session.run(ort::inputs! {
        "x" => TensorRef::from_array_view(&input.as_standard_layout())?
    })?;
    let val = &outputs[0]
        .try_extract_array::<f32>()?
        .into_dimensionality::<Ix2>()?;
    let output = val.to_owned();
    Ok(output)
}

Input: Array4<f32> with shape [N, 3, 48, 192] (batch of N images) Output: Array2<f32> with shape [N, 2] (class probabilities for 0° and 180°)

Recognition

fn rec(&mut self, input: Array4<f32>) -> RettoResult<Array3<f32>> {
    let outputs = self.rec_session.run(ort::inputs! {
        "x" => TensorRef::from_array_view(&input.as_standard_layout())?
    })?;
    let val = &outputs[0]
        .try_extract_array::<f32>()?
        .into_dimensionality::<Ix3>()?;
    let output = val.to_owned();
    Ok(output)
}

Input: Array4<f32> with shape [N, 3, 48, W] (variable width) Output: Array3<f32> with shape [N, T, C] (N sequences of T timesteps with C character classes)

Usage Examples

Default Configuration (CPU)

use retto::prelude::*;

let config = RettoSessionConfig::<RettoOrtWorker> {
    worker_config: RettoOrtWorkerConfig::default(),
    ..Default::default()
};

let mut session = RettoSession::new(config)?;

Uses HuggingFace Hub models (if available) with CPU inference.

CUDA GPU

let config = RettoSessionConfig::<RettoOrtWorker> {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::Cuda(0),  // GPU 0
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

let mut session = RettoSession::new(config)?;

Requires:

cargo build --features backend-ort-cuda
CUDA toolkit installed
NVIDIA GPU

DirectML (Windows)

let config = RettoSessionConfig::<RettoOrtWorker> {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::DirectML(0),
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

let mut session = RettoSession::new(config)?;

Requires:

cargo build --features backend-ort-directml
Windows 10 version 1903 or later
Any DirectX 12 capable GPU

Custom Model Paths

let config = RettoSessionConfig::<RettoOrtWorker> {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models: RettoOrtWorkerModelProvider(RettoWorkerModelProvider {
            det: RettoWorkerModelSource::Path("/models/custom_det.onnx".into()),
            cls: RettoWorkerModelSource::Path("/models/custom_cls.onnx".into()),
            rec: RettoWorkerModelSource::Path("/models/custom_rec.onnx".into()),
        }),
    },
    ..Default::default()
};

let mut session = RettoSession::new(config)?;

HuggingFace Custom Repository

let config = RettoSessionConfig::<RettoOrtWorker> {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models: RettoOrtWorkerModelProvider(RettoWorkerModelProvider {
            det: RettoWorkerModelSource::HuggingFace {
                repo: "myorg/mymodels".to_string(),
                model: "det_model.onnx".to_string(),
            },
            cls: RettoWorkerModelSource::HuggingFace {
                repo: "myorg/mymodels".to_string(),
                model: "cls_model.onnx".to_string(),
            },
            rec: RettoWorkerModelSource::HuggingFace {
                repo: "myorg/mymodels".to_string(),
                model: "rec_model.onnx".to_string(),
            },
        }),
    },
    ..Default::default()
};

let mut session = RettoSession::new(config)?;

Feature Flags

Backend Selection

backend-ort - ONNX Runtime support (enabled by default)
backend-ort-cuda - CUDA execution provider
backend-ort-directml - DirectML execution provider

Model Loading

hf-hub - HuggingFace Hub integration (enabled by default)
download-models - Embed models in binary (for WebAssembly)

Other Features

serde - Serialization support for configurations

Performance Considerations

Memory Usage

Three Sessions: Each model (det/cls/rec) loads independently, using ~100-300MB per model
Model Caching: HuggingFace Hub caches models in ~/.cache/huggingface/
GPU Memory: CUDA allocates arena memory for tensors

Initialization Time

First Run: HuggingFace downloads models (~50-150MB)
Session Creation: ONNX Runtime optimizes graphs (~1-5 seconds)
CUDA Initialization: First inference tests algorithms (~2-10 seconds)
Subsequent Runs: Models load from cache (~1-2 seconds)

Inference Speed

Typical timings for 1920×1080 image with ~10 text regions:

Backend	Detection	Classification	Recognition	Total
CPU (Intel i7)	~300ms	~20ms	~100ms	~420ms
CUDA (RTX 3080)	~40ms	~5ms	~15ms	~60ms
DirectML (RTX 3080)	~80ms	~10ms	~30ms	~120ms

Actual performance varies based on image size, text density, and hardware capabilities.

Error Handling

Model Loading Errors

RettoError::ModelNotFoundError(String)

Thrown when:

Local path doesn’t exist
Empty blob provided
HuggingFace download fails

ONNX Runtime Errors

RettoError::OrtError(ort::error::Error)

Thrown when:

Session creation fails
Inference fails
Tensor conversion fails
Execution provider unavailable

Extending with Custom Workers

To implement a custom worker:

Implement RettoInnerWorker with your inference logic
Implement RettoWorker with configuration and initialization
Define RettoWorkerConfig and RettoWorkerModelProvider types
Use with RettoSession<YourWorker>

struct MyCustomWorker {
    // Your implementation
}

impl RettoInnerWorker for MyCustomWorker {
    fn det(&mut self, input: Array4<f32>) -> RettoResult<Array4<f32>> {
        // Your detection inference
    }
    
    fn cls(&mut self, input: Array4<f32>) -> RettoResult<Array2<f32>> {
        // Your classification inference
    }
    
    fn rec(&mut self, input: Array4<f32>) -> RettoResult<Array3<f32>> {
        // Your recognition inference
    }
}

impl RettoWorker for MyCustomWorker {
    type RettoWorkerModelProvider = MyModelProvider;
    type RettoWorkerConfig = MyWorkerConfig;
    
    fn new(cfg: Self::RettoWorkerConfig) -> RettoResult<Self> {
        // Initialize your worker
    }
    
    fn init(&self) -> RettoResult<()> {
        Ok(())
    }
}

Get Started

Core Concepts

Guides

Examples

​Overview

​Worker Architecture

​RettoInnerWorker Trait

​RettoWorker Trait

​Model Loading Strategies

​1. HuggingFace Hub (Recommended)

​2. Local Path

​3. Embedded Blob (WebAssembly)

​Default Strategy Selection

​RettoOrtWorker Implementation

​Structure

​Configuration

​Device Selection

​Session Creation

​CUDA Configuration

​Session Building

​Inference Implementation

​Detection

​Classification

​Recognition

​Usage Examples

​Default Configuration (CPU)

​CUDA GPU

​DirectML (Windows)

​Custom Model Paths

​HuggingFace Custom Repository

​Feature Flags

​Backend Selection

​Model Loading

​Other Features

​Performance Considerations

​Memory Usage

​Initialization Time

​Inference Speed

​Error Handling

​Model Loading Errors

​ONNX Runtime Errors

​Extending with Custom Workers

​Next Steps

Architecture

Processors

Build docs developers (and LLMs) love

Overview

Worker Architecture

RettoInnerWorker Trait

RettoWorker Trait

Model Loading Strategies

1. HuggingFace Hub (Recommended)

2. Local Path

3. Embedded Blob (WebAssembly)

Default Strategy Selection

RettoOrtWorker Implementation

Structure

Configuration

Device Selection

Session Creation

CUDA Configuration

Session Building

Inference Implementation

Detection

Classification

Recognition

Usage Examples

Default Configuration (CPU)

CUDA GPU

DirectML (Windows)

Custom Model Paths

HuggingFace Custom Repository

Feature Flags

Backend Selection

Model Loading

Other Features

Performance Considerations

Memory Usage

Initialization Time

Inference Speed

Error Handling

Model Loading Errors

ONNX Runtime Errors

Extending with Custom Workers

Next Steps