Overview
Workers are the inference engine layer in Retto, responsible for executing ONNX models. They abstract away backend-specific implementation details, allowing the same processor code to work with different inference engines.
Currently, Retto provides one worker implementation:
RettoOrtWorker - ONNX Runtime backend with CPU, CUDA, and DirectML support
Worker Architecture
The worker abstraction is defined by two traits in worker.rs:
RettoInnerWorker Trait
From worker.rs:69:
pub ( crate ) trait RettoInnerWorker {
fn det ( & mut self , input : Array4 < f32 >) -> RettoResult < Array4 < f32 >>;
fn cls ( & mut self , input : Array4 < f32 >) -> RettoResult < Array2 < f32 >>;
fn rec ( & mut self , input : Array4 < f32 >) -> RettoResult < Array3 < f32 >>;
}
Defines the three inference operations:
det : Detection model (4D input → 4D output)
cls : Classification model (4D input → 2D output)
rec : Recognition model (4D input → 3D output)
RettoWorker Trait
From worker.rs:91:
pub trait RettoWorker : RettoInnerWorker {
type RettoWorkerModelProvider : RettoWorkerModelProviderBuilder ;
type RettoWorkerConfig : Debug + Default + Clone + MaybeSerde ;
fn new ( cfg : Self :: RettoWorkerConfig ) -> RettoResult < Self >
where
Self : Sized ;
fn init ( & self ) -> RettoResult <()>;
}
Adds configuration and initialization capabilities on top of inference operations.
Model Loading Strategies
Retto supports three model loading strategies based on platform and features:
1. HuggingFace Hub (Recommended)
From worker.rs:49:
RettoWorkerModelSource :: HuggingFace {
repo : String ,
model : String ,
}
Automatically downloads models from HuggingFace Hub and caches them locally.
Requirements :
hf-hub feature enabled
Not WebAssembly target
Internet connection for first run
Default models (ort_worker.rs:60):
fn from_hf_hub_v4_default () -> Self {
let hf_repo = "pk5ls20/PaddleModel" ;
Self ( RettoWorkerModelProvider {
det : RettoWorkerModelSource :: HuggingFace {
repo : hf_repo . to_string (),
model : "retto/onnx/ch_PP-OCRv4_det_infer.onnx" . to_string (),
},
rec : RettoWorkerModelSource :: HuggingFace {
repo : hf_repo . to_string (),
model : "retto/onnx/ch_PP-OCRv4_rec_infer.onnx" . to_string (),
},
cls : RettoWorkerModelSource :: HuggingFace {
repo : hf_repo . to_string (),
model : "retto/onnx/ch_ppocr_mobile_v2.0_cls_infer.onnx" . to_string (),
},
})
}
Models used :
Detection: PaddleOCR v4 detection model
Recognition: PaddleOCR v4 recognition model (Chinese + English)
Classification: PaddleOCR v2.0 mobile classification model
2. Local Path
From worker.rs:33:
RettoWorkerModelSource :: Path ( String )
Loads models from local filesystem.
Requirements :
Not WebAssembly target
Models must exist at specified paths
Default paths (ort_worker.rs:79):
fn from_local_v4_path_default () -> Self {
Self ( RettoWorkerModelProvider {
det : RettoWorkerModelSource :: Path ( "ch_PP-OCRv4_det_infer.onnx" . into ()),
rec : RettoWorkerModelSource :: Path ( "ch_PP-OCRv4_rec_infer.onnx" . into ()),
cls : RettoWorkerModelSource :: Path ( "ch_ppocr_mobile_v2.0_cls_infer.onnx" . into ()),
})
}
Path validation (worker.rs:34):
RettoWorkerModelSource :: Path ( path ) => {
let path = std :: path :: PathBuf :: from ( path );
match path . exists () {
true => Ok ( RettoWorkerModelResolvedSource :: Path ( path )),
false => Err ( RettoError :: ModelNotFoundError (
path . into_os_string () . to_string_lossy () . to_string (),
)),
}
}
3. Embedded Blob (WebAssembly)
From worker.rs:21:
RettoWorkerModelSource :: Blob ( Vec < u8 >)
Embeds model data directly in the binary.
Requirements :
download-models feature enabled
Increases binary size significantly (~50MB for all models)
Default blob loading (ort_worker.rs:88):
#[cfg(feature = "download-models" )]
fn from_local_v4_blob_default () -> Self {
Self ( RettoWorkerModelProvider {
det : RettoWorkerModelSource :: Blob (
include_bytes! ( "../../models/ch_PP-OCRv4_det_infer.onnx" ) . to_vec (),
),
rec : RettoWorkerModelSource :: Blob (
include_bytes! ( "../../models/ch_PP-OCRv4_rec_infer.onnx" ) . to_vec (),
),
cls : RettoWorkerModelSource :: Blob (
include_bytes! ( "../../models/ch_ppocr_mobile_v2.0_cls_infer.onnx" ) . to_vec (),
),
})
}
Default Strategy Selection
From worker.rs:81:
fn default_provider () -> Self {
#[cfg(all(not(target_family = "wasm" ), feature = "hf-hub" ))]
return Self :: from_hf_hub_v4_default ();
#[cfg(all(not(target_family = "wasm" ), not(feature = "hf-hub" )))]
return Self :: from_local_v4_path_default ();
#[cfg(target_family = "wasm" )]
return Self :: from_local_v4_blob_default ();
}
Priority:
HuggingFace Hub (if available)
Local path (native without hf-hub)
Embedded blob (WebAssembly)
RettoOrtWorker Implementation
The ONNX Runtime worker provides cross-platform inference with multiple execution providers.
Structure
From ort_worker.rs:113:
pub struct RettoOrtWorker {
cfg : RettoOrtWorkerConfig ,
det_session : ort :: session :: Session ,
rec_session : ort :: session :: Session ,
cls_session : ort :: session :: Session ,
}
Each processor has its own ONNX Runtime session for independent inference.
Configuration
From ort_worker.rs:52:
pub struct RettoOrtWorkerConfig {
pub device : RettoOrtWorkerDevice ,
pub models : RettoOrtWorkerModelProvider ,
}
Device Selection
From ort_worker.rs:21:
pub enum RettoOrtWorkerDevice {
#[default]
CPU ,
#[cfg(feature = "backend-ort-cuda" )]
Cuda ( i32 ), // Device ID
#[cfg(feature = "backend-ort-directml" )]
DirectML ( i32 ), // Device ID
}
CPU (default):
Always available
No additional dependencies
Slower than GPU options
CUDA :
Requires backend-ort-cuda feature
NVIDIA GPUs only
Best performance on Linux/Windows
DirectML :
Requires backend-ort-directml feature
Works with any GPU on Windows 10+
Good compatibility, moderate performance
Session Creation
From ort_worker.rs:140:
fn new ( cfg : Self :: RettoWorkerConfig ) -> RettoResult < Self > {
#[cfg(target_family = "wasm" )]
{
ort :: init ()
. with_global_thread_pool ( ort :: environment :: GlobalThreadPoolOptions :: default ())
. commit ()
. expect ( "Cannot initialize ort." );
}
let mut providers = Vec :: new ();
match cfg . device {
#[cfg(feature = "backend-ort-cuda" )]
RettoOrtWorkerDevice :: Cuda ( id ) => providers . push (
CUDAExecutionProvider :: default ()
. with_arena_extend_strategy ( NextPowerOfTwo )
. with_conv_algorithm_search ( Exhaustive )
. with_device_id ( id )
. build (),
),
#[cfg(feature = "backend-ort-directml" )]
RettoOrtWorkerDevice :: DirectML ( id ) => providers . push (
DirectMLExecutionProvider :: default ()
. with_device_id ( id )
. build (),
),
_ => {}
};
providers . push ( CPUExecutionProvider :: default () . build ());
let det_session = build_ort_session ( cfg . models . det . clone (), & providers ) ? ;
let cls_session = build_ort_session ( cfg . models . cls . clone (), & providers ) ? ;
let rec_session = build_ort_session ( cfg . models . rec . clone (), & providers ) ? ;
// ...
}
Key points :
WebAssembly requires explicit ONNX Runtime initialization
CPU provider is always added as fallback
Three separate sessions are created for det/cls/rec models
All sessions use the same execution providers
CUDA Configuration
From ort_worker.rs:156:
CUDAExecutionProvider :: default ()
. with_arena_extend_strategy ( NextPowerOfTwo )
. with_conv_algorithm_search ( Exhaustive )
. with_device_id ( id )
. build ()
Arena extend strategy : NextPowerOfTwo allocates memory in power-of-2 chunks, reducing fragmentation.
Conv algorithm search : Exhaustive finds the fastest convolution algorithm for your GPU (slower startup, faster inference).
Session Building
From ort_worker.rs:120:
fn build_ort_session (
model_source : RettoWorkerModelSource ,
providers : & [ ExecutionProviderDispatch ],
) -> RettoResult < ort :: session :: Session > {
let builder = ort :: session :: Session :: builder () ?
. with_execution_providers ( providers ) ? ;
let model_source = model_source . resolve () ? ;
match model_source {
#[cfg(not(target_family = "wasm" ))]
RettoWorkerModelResolvedSource :: Path ( path ) => {
builder . commit_from_file ( path ) . map_err ( RettoError :: from )
}
RettoWorkerModelResolvedSource :: Blob ( blob ) => {
builder . commit_from_memory ( & blob ) . map_err ( RettoError :: from )
}
}
}
Supports both file-based and memory-based model loading.
Inference Implementation
From ort_worker.rs:188:
Detection
fn det ( & mut self , input : Array4 < f32 >) -> RettoResult < Array4 < f32 >> {
let outputs = self . det_session . run ( ort :: inputs! {
"x" => TensorRef :: from_array_view ( & input . as_standard_layout ()) ?
}) ? ;
let val = & outputs [ 0 ]
. try_extract_array :: < f32 >() ?
. into_dimensionality :: < Ix4 >() ? ;
let output = val . to_owned ();
Ok ( output )
}
Input : Array4<f32> with shape [1, 3, H, W] (batch, channels, height, width)
Output : Array4<f32> with shape [1, 1, H, W] (probability map)
Classification
fn cls ( & mut self , input : Array4 < f32 >) -> RettoResult < Array2 < f32 >> {
let outputs = self . cls_session . run ( ort :: inputs! {
"x" => TensorRef :: from_array_view ( & input . as_standard_layout ()) ?
}) ? ;
let val = & outputs [ 0 ]
. try_extract_array :: < f32 >() ?
. into_dimensionality :: < Ix2 >() ? ;
let output = val . to_owned ();
Ok ( output )
}
Input : Array4<f32> with shape [N, 3, 48, 192] (batch of N images)
Output : Array2<f32> with shape [N, 2] (class probabilities for 0° and 180°)
Recognition
fn rec ( & mut self , input : Array4 < f32 >) -> RettoResult < Array3 < f32 >> {
let outputs = self . rec_session . run ( ort :: inputs! {
"x" => TensorRef :: from_array_view ( & input . as_standard_layout ()) ?
}) ? ;
let val = & outputs [ 0 ]
. try_extract_array :: < f32 >() ?
. into_dimensionality :: < Ix3 >() ? ;
let output = val . to_owned ();
Ok ( output )
}
Input : Array4<f32> with shape [N, 3, 48, W] (variable width)
Output : Array3<f32> with shape [N, T, C] (N sequences of T timesteps with C character classes)
Usage Examples
Default Configuration (CPU)
use retto :: prelude ::* ;
let config = RettoSessionConfig :: < RettoOrtWorker > {
worker_config : RettoOrtWorkerConfig :: default (),
.. Default :: default ()
};
let mut session = RettoSession :: new ( config ) ? ;
Uses HuggingFace Hub models (if available) with CPU inference.
CUDA GPU
let config = RettoSessionConfig :: < RettoOrtWorker > {
worker_config : RettoOrtWorkerConfig {
device : RettoOrtWorkerDevice :: Cuda ( 0 ), // GPU 0
models : RettoOrtWorkerModelProvider :: default (),
},
.. Default :: default ()
};
let mut session = RettoSession :: new ( config ) ? ;
Requires:
cargo build --features backend-ort-cuda
CUDA toolkit installed
NVIDIA GPU
DirectML (Windows)
let config = RettoSessionConfig :: < RettoOrtWorker > {
worker_config : RettoOrtWorkerConfig {
device : RettoOrtWorkerDevice :: DirectML ( 0 ),
models : RettoOrtWorkerModelProvider :: default (),
},
.. Default :: default ()
};
let mut session = RettoSession :: new ( config ) ? ;
Requires:
cargo build --features backend-ort-directml
Windows 10 version 1903 or later
Any DirectX 12 capable GPU
Custom Model Paths
let config = RettoSessionConfig :: < RettoOrtWorker > {
worker_config : RettoOrtWorkerConfig {
device : RettoOrtWorkerDevice :: CPU ,
models : RettoOrtWorkerModelProvider ( RettoWorkerModelProvider {
det : RettoWorkerModelSource :: Path ( "/models/custom_det.onnx" . into ()),
cls : RettoWorkerModelSource :: Path ( "/models/custom_cls.onnx" . into ()),
rec : RettoWorkerModelSource :: Path ( "/models/custom_rec.onnx" . into ()),
}),
},
.. Default :: default ()
};
let mut session = RettoSession :: new ( config ) ? ;
HuggingFace Custom Repository
let config = RettoSessionConfig :: < RettoOrtWorker > {
worker_config : RettoOrtWorkerConfig {
device : RettoOrtWorkerDevice :: CPU ,
models : RettoOrtWorkerModelProvider ( RettoWorkerModelProvider {
det : RettoWorkerModelSource :: HuggingFace {
repo : "myorg/mymodels" . to_string (),
model : "det_model.onnx" . to_string (),
},
cls : RettoWorkerModelSource :: HuggingFace {
repo : "myorg/mymodels" . to_string (),
model : "cls_model.onnx" . to_string (),
},
rec : RettoWorkerModelSource :: HuggingFace {
repo : "myorg/mymodels" . to_string (),
model : "rec_model.onnx" . to_string (),
},
}),
},
.. Default :: default ()
};
let mut session = RettoSession :: new ( config ) ? ;
Feature Flags
Backend Selection
backend-ort - ONNX Runtime support (enabled by default)
backend-ort-cuda - CUDA execution provider
backend-ort-directml - DirectML execution provider
Model Loading
hf-hub - HuggingFace Hub integration (enabled by default)
download-models - Embed models in binary (for WebAssembly)
Other Features
serde - Serialization support for configurations
Memory Usage
Three Sessions : Each model (det/cls/rec) loads independently, using ~100-300MB per model
Model Caching : HuggingFace Hub caches models in ~/.cache/huggingface/
GPU Memory : CUDA allocates arena memory for tensors
Initialization Time
First Run : HuggingFace downloads models (~50-150MB)
Session Creation : ONNX Runtime optimizes graphs (~1-5 seconds)
CUDA Initialization : First inference tests algorithms (~2-10 seconds)
Subsequent Runs : Models load from cache (~1-2 seconds)
Inference Speed
Typical timings for 1920×1080 image with ~10 text regions:
Backend Detection Classification Recognition Total CPU (Intel i7) ~300ms ~20ms ~100ms ~420ms CUDA (RTX 3080) ~40ms ~5ms ~15ms ~60ms DirectML (RTX 3080) ~80ms ~10ms ~30ms ~120ms
Actual performance varies based on image size, text density, and hardware capabilities.
Error Handling
Model Loading Errors
RettoError :: ModelNotFoundError ( String )
Thrown when:
Local path doesn’t exist
Empty blob provided
HuggingFace download fails
ONNX Runtime Errors
RettoError :: OrtError ( ort :: error :: Error )
Thrown when:
Session creation fails
Inference fails
Tensor conversion fails
Execution provider unavailable
Extending with Custom Workers
To implement a custom worker:
Implement RettoInnerWorker with your inference logic
Implement RettoWorker with configuration and initialization
Define RettoWorkerConfig and RettoWorkerModelProvider types
Use with RettoSession<YourWorker>
struct MyCustomWorker {
// Your implementation
}
impl RettoInnerWorker for MyCustomWorker {
fn det ( & mut self , input : Array4 < f32 >) -> RettoResult < Array4 < f32 >> {
// Your detection inference
}
fn cls ( & mut self , input : Array4 < f32 >) -> RettoResult < Array2 < f32 >> {
// Your classification inference
}
fn rec ( & mut self , input : Array4 < f32 >) -> RettoResult < Array3 < f32 >> {
// Your recognition inference
}
}
impl RettoWorker for MyCustomWorker {
type RettoWorkerModelProvider = MyModelProvider ;
type RettoWorkerConfig = MyWorkerConfig ;
fn new ( cfg : Self :: RettoWorkerConfig ) -> RettoResult < Self > {
// Initialize your worker
}
fn init ( & self ) -> RettoResult <()> {
Ok (())
}
}
Next Steps
Architecture Understand how workers fit into the overall architecture
Processors Learn about preprocessing and postprocessing pipelines