Skip to main content

Overview

Workers are the inference engine layer in Retto, responsible for executing ONNX models. They abstract away backend-specific implementation details, allowing the same processor code to work with different inference engines. Currently, Retto provides one worker implementation:
  • RettoOrtWorker - ONNX Runtime backend with CPU, CUDA, and DirectML support

Worker Architecture

The worker abstraction is defined by two traits in worker.rs:

RettoInnerWorker Trait

From worker.rs:69:
pub(crate) trait RettoInnerWorker {
    fn det(&mut self, input: Array4<f32>) -> RettoResult<Array4<f32>>;
    fn cls(&mut self, input: Array4<f32>) -> RettoResult<Array2<f32>>;
    fn rec(&mut self, input: Array4<f32>) -> RettoResult<Array3<f32>>;  
}
Defines the three inference operations:
  • det: Detection model (4D input → 4D output)
  • cls: Classification model (4D input → 2D output)
  • rec: Recognition model (4D input → 3D output)

RettoWorker Trait

From worker.rs:91:
pub trait RettoWorker: RettoInnerWorker {
    type RettoWorkerModelProvider: RettoWorkerModelProviderBuilder;
    type RettoWorkerConfig: Debug + Default + Clone + MaybeSerde;
    
    fn new(cfg: Self::RettoWorkerConfig) -> RettoResult<Self>
    where
        Self: Sized;
    
    fn init(&self) -> RettoResult<()>;
}
Adds configuration and initialization capabilities on top of inference operations.

Model Loading Strategies

Retto supports three model loading strategies based on platform and features: From worker.rs:49:
RettoWorkerModelSource::HuggingFace {
    repo: String,
    model: String,
}
Automatically downloads models from HuggingFace Hub and caches them locally. Requirements:
  • hf-hub feature enabled
  • Not WebAssembly target
  • Internet connection for first run
Default models (ort_worker.rs:60):
fn from_hf_hub_v4_default() -> Self {
    let hf_repo = "pk5ls20/PaddleModel";
    Self(RettoWorkerModelProvider {
        det: RettoWorkerModelSource::HuggingFace {
            repo: hf_repo.to_string(),
            model: "retto/onnx/ch_PP-OCRv4_det_infer.onnx".to_string(),
        },
        rec: RettoWorkerModelSource::HuggingFace {
            repo: hf_repo.to_string(),
            model: "retto/onnx/ch_PP-OCRv4_rec_infer.onnx".to_string(),
        },
        cls: RettoWorkerModelSource::HuggingFace {
            repo: hf_repo.to_string(),
            model: "retto/onnx/ch_ppocr_mobile_v2.0_cls_infer.onnx".to_string(),
        },
    })
}
Models used:
  • Detection: PaddleOCR v4 detection model
  • Recognition: PaddleOCR v4 recognition model (Chinese + English)
  • Classification: PaddleOCR v2.0 mobile classification model

2. Local Path

From worker.rs:33:
RettoWorkerModelSource::Path(String)
Loads models from local filesystem. Requirements:
  • Not WebAssembly target
  • Models must exist at specified paths
Default paths (ort_worker.rs:79):
fn from_local_v4_path_default() -> Self {
    Self(RettoWorkerModelProvider {
        det: RettoWorkerModelSource::Path("ch_PP-OCRv4_det_infer.onnx".into()),
        rec: RettoWorkerModelSource::Path("ch_PP-OCRv4_rec_infer.onnx".into()),
        cls: RettoWorkerModelSource::Path("ch_ppocr_mobile_v2.0_cls_infer.onnx".into()),
    })
}
Path validation (worker.rs:34):
RettoWorkerModelSource::Path(path) => {
    let path = std::path::PathBuf::from(path);
    match path.exists() {
        true => Ok(RettoWorkerModelResolvedSource::Path(path)),
        false => Err(RettoError::ModelNotFoundError(
            path.into_os_string().to_string_lossy().to_string(),
        )),
    }
}

3. Embedded Blob (WebAssembly)

From worker.rs:21:
RettoWorkerModelSource::Blob(Vec<u8>)
Embeds model data directly in the binary. Requirements:
  • download-models feature enabled
  • Increases binary size significantly (~50MB for all models)
Default blob loading (ort_worker.rs:88):
#[cfg(feature = "download-models")]
fn from_local_v4_blob_default() -> Self {
    Self(RettoWorkerModelProvider {
        det: RettoWorkerModelSource::Blob(
            include_bytes!("../../models/ch_PP-OCRv4_det_infer.onnx").to_vec(),
        ),
        rec: RettoWorkerModelSource::Blob(
            include_bytes!("../../models/ch_PP-OCRv4_rec_infer.onnx").to_vec(),
        ),
        cls: RettoWorkerModelSource::Blob(
            include_bytes!("../../models/ch_ppocr_mobile_v2.0_cls_infer.onnx").to_vec(),
        ),
    })
}

Default Strategy Selection

From worker.rs:81:
fn default_provider() -> Self {
    #[cfg(all(not(target_family = "wasm"), feature = "hf-hub"))]
    return Self::from_hf_hub_v4_default();
    
    #[cfg(all(not(target_family = "wasm"), not(feature = "hf-hub")))]
    return Self::from_local_v4_path_default();
    
    #[cfg(target_family = "wasm")]
    return Self::from_local_v4_blob_default();
}
Priority:
  1. HuggingFace Hub (if available)
  2. Local path (native without hf-hub)
  3. Embedded blob (WebAssembly)

RettoOrtWorker Implementation

The ONNX Runtime worker provides cross-platform inference with multiple execution providers.

Structure

From ort_worker.rs:113:
pub struct RettoOrtWorker {
    cfg: RettoOrtWorkerConfig,
    det_session: ort::session::Session,
    rec_session: ort::session::Session,
    cls_session: ort::session::Session,
}
Each processor has its own ONNX Runtime session for independent inference.

Configuration

From ort_worker.rs:52:
pub struct RettoOrtWorkerConfig {
    pub device: RettoOrtWorkerDevice,
    pub models: RettoOrtWorkerModelProvider,
}

Device Selection

From ort_worker.rs:21:
pub enum RettoOrtWorkerDevice {
    #[default]
    CPU,
    
    #[cfg(feature = "backend-ort-cuda")]
    Cuda(i32),  // Device ID
    
    #[cfg(feature = "backend-ort-directml")]
    DirectML(i32),  // Device ID
}
CPU (default):
  • Always available
  • No additional dependencies
  • Slower than GPU options
CUDA:
  • Requires backend-ort-cuda feature
  • NVIDIA GPUs only
  • Best performance on Linux/Windows
DirectML:
  • Requires backend-ort-directml feature
  • Works with any GPU on Windows 10+
  • Good compatibility, moderate performance

Session Creation

From ort_worker.rs:140:
fn new(cfg: Self::RettoWorkerConfig) -> RettoResult<Self> {
    #[cfg(target_family = "wasm")]
    {
        ort::init()
            .with_global_thread_pool(ort::environment::GlobalThreadPoolOptions::default())
            .commit()
            .expect("Cannot initialize ort.");
    }
    
    let mut providers = Vec::new();
    match cfg.device {
        #[cfg(feature = "backend-ort-cuda")]
        RettoOrtWorkerDevice::Cuda(id) => providers.push(
            CUDAExecutionProvider::default()
                .with_arena_extend_strategy(NextPowerOfTwo)
                .with_conv_algorithm_search(Exhaustive)
                .with_device_id(id)
                .build(),
        ),
        #[cfg(feature = "backend-ort-directml")]
        RettoOrtWorkerDevice::DirectML(id) => providers.push(
            DirectMLExecutionProvider::default()
                .with_device_id(id)
                .build(),
        ),
        _ => {}
    };
    providers.push(CPUExecutionProvider::default().build());
    
    let det_session = build_ort_session(cfg.models.det.clone(), &providers)?;
    let cls_session = build_ort_session(cfg.models.cls.clone(), &providers)?;
    let rec_session = build_ort_session(cfg.models.rec.clone(), &providers)?;
    // ...
}
Key points:
  1. WebAssembly requires explicit ONNX Runtime initialization
  2. CPU provider is always added as fallback
  3. Three separate sessions are created for det/cls/rec models
  4. All sessions use the same execution providers

CUDA Configuration

From ort_worker.rs:156:
CUDAExecutionProvider::default()
    .with_arena_extend_strategy(NextPowerOfTwo)
    .with_conv_algorithm_search(Exhaustive)
    .with_device_id(id)
    .build()
Arena extend strategy: NextPowerOfTwo allocates memory in power-of-2 chunks, reducing fragmentation. Conv algorithm search: Exhaustive finds the fastest convolution algorithm for your GPU (slower startup, faster inference).

Session Building

From ort_worker.rs:120:
fn build_ort_session(
    model_source: RettoWorkerModelSource,
    providers: &[ExecutionProviderDispatch],
) -> RettoResult<ort::session::Session> {
    let builder = ort::session::Session::builder()?
        .with_execution_providers(providers)?;
    let model_source = model_source.resolve()?;
    match model_source {
        #[cfg(not(target_family = "wasm"))]
        RettoWorkerModelResolvedSource::Path(path) => {
            builder.commit_from_file(path).map_err(RettoError::from)
        }
        RettoWorkerModelResolvedSource::Blob(blob) => {
            builder.commit_from_memory(&blob).map_err(RettoError::from)
        }
    }
}
Supports both file-based and memory-based model loading.

Inference Implementation

From ort_worker.rs:188:

Detection

fn det(&mut self, input: Array4<f32>) -> RettoResult<Array4<f32>> {
    let outputs = self.det_session.run(ort::inputs! {
        "x" => TensorRef::from_array_view(&input.as_standard_layout())?
    })?;
    let val = &outputs[0]
        .try_extract_array::<f32>()?  
        .into_dimensionality::<Ix4>()?;
    let output = val.to_owned();
    Ok(output)
}
Input: Array4<f32> with shape [1, 3, H, W] (batch, channels, height, width) Output: Array4<f32> with shape [1, 1, H, W] (probability map)

Classification

fn cls(&mut self, input: Array4<f32>) -> RettoResult<Array2<f32>> {
    let outputs = self.cls_session.run(ort::inputs! {
        "x" => TensorRef::from_array_view(&input.as_standard_layout())?
    })?;
    let val = &outputs[0]
        .try_extract_array::<f32>()?
        .into_dimensionality::<Ix2>()?;
    let output = val.to_owned();
    Ok(output)
}
Input: Array4<f32> with shape [N, 3, 48, 192] (batch of N images) Output: Array2<f32> with shape [N, 2] (class probabilities for 0° and 180°)

Recognition

fn rec(&mut self, input: Array4<f32>) -> RettoResult<Array3<f32>> {
    let outputs = self.rec_session.run(ort::inputs! {
        "x" => TensorRef::from_array_view(&input.as_standard_layout())?
    })?;
    let val = &outputs[0]
        .try_extract_array::<f32>()?
        .into_dimensionality::<Ix3>()?;
    let output = val.to_owned();
    Ok(output)
}
Input: Array4<f32> with shape [N, 3, 48, W] (variable width) Output: Array3<f32> with shape [N, T, C] (N sequences of T timesteps with C character classes)

Usage Examples

Default Configuration (CPU)

use retto::prelude::*;

let config = RettoSessionConfig::<RettoOrtWorker> {
    worker_config: RettoOrtWorkerConfig::default(),
    ..Default::default()
};

let mut session = RettoSession::new(config)?;
Uses HuggingFace Hub models (if available) with CPU inference.

CUDA GPU

let config = RettoSessionConfig::<RettoOrtWorker> {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::Cuda(0),  // GPU 0
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

let mut session = RettoSession::new(config)?;
Requires:
  • cargo build --features backend-ort-cuda
  • CUDA toolkit installed
  • NVIDIA GPU

DirectML (Windows)

let config = RettoSessionConfig::<RettoOrtWorker> {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::DirectML(0),
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

let mut session = RettoSession::new(config)?;
Requires:
  • cargo build --features backend-ort-directml
  • Windows 10 version 1903 or later
  • Any DirectX 12 capable GPU

Custom Model Paths

let config = RettoSessionConfig::<RettoOrtWorker> {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models: RettoOrtWorkerModelProvider(RettoWorkerModelProvider {
            det: RettoWorkerModelSource::Path("/models/custom_det.onnx".into()),
            cls: RettoWorkerModelSource::Path("/models/custom_cls.onnx".into()),
            rec: RettoWorkerModelSource::Path("/models/custom_rec.onnx".into()),
        }),
    },
    ..Default::default()
};

let mut session = RettoSession::new(config)?;

HuggingFace Custom Repository

let config = RettoSessionConfig::<RettoOrtWorker> {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models: RettoOrtWorkerModelProvider(RettoWorkerModelProvider {
            det: RettoWorkerModelSource::HuggingFace {
                repo: "myorg/mymodels".to_string(),
                model: "det_model.onnx".to_string(),
            },
            cls: RettoWorkerModelSource::HuggingFace {
                repo: "myorg/mymodels".to_string(),
                model: "cls_model.onnx".to_string(),
            },
            rec: RettoWorkerModelSource::HuggingFace {
                repo: "myorg/mymodels".to_string(),
                model: "rec_model.onnx".to_string(),
            },
        }),
    },
    ..Default::default()
};

let mut session = RettoSession::new(config)?;

Feature Flags

Backend Selection

  • backend-ort - ONNX Runtime support (enabled by default)
  • backend-ort-cuda - CUDA execution provider
  • backend-ort-directml - DirectML execution provider

Model Loading

  • hf-hub - HuggingFace Hub integration (enabled by default)
  • download-models - Embed models in binary (for WebAssembly)

Other Features

  • serde - Serialization support for configurations

Performance Considerations

Memory Usage

  1. Three Sessions: Each model (det/cls/rec) loads independently, using ~100-300MB per model
  2. Model Caching: HuggingFace Hub caches models in ~/.cache/huggingface/
  3. GPU Memory: CUDA allocates arena memory for tensors

Initialization Time

  1. First Run: HuggingFace downloads models (~50-150MB)
  2. Session Creation: ONNX Runtime optimizes graphs (~1-5 seconds)
  3. CUDA Initialization: First inference tests algorithms (~2-10 seconds)
  4. Subsequent Runs: Models load from cache (~1-2 seconds)

Inference Speed

Typical timings for 1920×1080 image with ~10 text regions:
BackendDetectionClassificationRecognitionTotal
CPU (Intel i7)~300ms~20ms~100ms~420ms
CUDA (RTX 3080)~40ms~5ms~15ms~60ms
DirectML (RTX 3080)~80ms~10ms~30ms~120ms
Actual performance varies based on image size, text density, and hardware capabilities.

Error Handling

Model Loading Errors

RettoError::ModelNotFoundError(String)
Thrown when:
  • Local path doesn’t exist
  • Empty blob provided
  • HuggingFace download fails

ONNX Runtime Errors

RettoError::OrtError(ort::error::Error)
Thrown when:
  • Session creation fails
  • Inference fails
  • Tensor conversion fails
  • Execution provider unavailable

Extending with Custom Workers

To implement a custom worker:
  1. Implement RettoInnerWorker with your inference logic
  2. Implement RettoWorker with configuration and initialization
  3. Define RettoWorkerConfig and RettoWorkerModelProvider types
  4. Use with RettoSession<YourWorker>
struct MyCustomWorker {
    // Your implementation
}

impl RettoInnerWorker for MyCustomWorker {
    fn det(&mut self, input: Array4<f32>) -> RettoResult<Array4<f32>> {
        // Your detection inference
    }
    
    fn cls(&mut self, input: Array4<f32>) -> RettoResult<Array2<f32>> {
        // Your classification inference
    }
    
    fn rec(&mut self, input: Array4<f32>) -> RettoResult<Array3<f32>> {
        // Your recognition inference
    }
}

impl RettoWorker for MyCustomWorker {
    type RettoWorkerModelProvider = MyModelProvider;
    type RettoWorkerConfig = MyWorkerConfig;
    
    fn new(cfg: Self::RettoWorkerConfig) -> RettoResult<Self> {
        // Initialize your worker
    }
    
    fn init(&self) -> RettoResult<()> {
        Ok(())
    }
}

Next Steps

Architecture

Understand how workers fit into the overall architecture

Processors

Learn about preprocessing and postprocessing pipelines

Build docs developers (and LLMs) love