Configuration

RettoSessionConfig

Main configuration struct for RettoSession. Controls image preprocessing and all three OCR pipeline stages.

pub struct RettoSessionConfig<W: RettoWorker> {
    pub worker_config: W::RettoWorkerConfig,
    pub max_side_len: usize,
    pub min_side_len: usize,
    pub det_processor_config: DetProcessorConfig,
    pub cls_processor_config: ClsProcessorConfig,
    pub rec_processor_config: RecProcessorConfig,
}

Fields

worker_config

W::RettoWorkerConfig

Backend-specific configuration (e.g., RettoOrtWorkerConfig for ONNX Runtime). Controls model loading, execution providers, and inference settings.

max_side_len

usize

default:"2000"

Maximum length of the longest side when resizing input images. Images larger than this will be scaled down while maintaining aspect ratio. This affects processing speed and memory usage.

min_side_len

usize

default:"30"

Minimum length of the shortest side when resizing. Images smaller than this will be scaled up. Prevents extremely small images from being processed incorrectly.

det_processor_config

DetProcessorConfig

Configuration for the text detection stage. See DetProcessorConfig below.

cls_processor_config

ClsProcessorConfig

Configuration for the text orientation classification stage. See ClsProcessorConfig below.

rec_processor_config

RecProcessorConfig

Configuration for the text recognition stage. See RecProcessorConfig below.

Example

use retto_core::prelude::*;

let config = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    max_side_len: 1920,  // Full HD width
    min_side_len: 50,    // Minimum readable size
    det_processor_config: DetProcessorConfig {
        box_thresh: 0.6,  // Higher threshold = fewer false positives
        ..Default::default()
    },
    cls_processor_config: ClsProcessorConfig::default(),
    rec_processor_config: RecProcessorConfig::default(),
};

let session = RettoSession::new(config)?;

DetProcessorConfig

Configuration for the DB (Differentiable Binarization) text detection algorithm.

pub struct DetProcessorConfig {
    pub limit_side_len: usize,
    pub limit_type: LimitType,
    pub mean: Array1<f32>,
    pub std: Array1<f32>,
    pub scale: f32,
    pub threch: f32,
    pub box_thresh: f32,
    pub max_candidates: usize,
    pub unclip_ratio: f32,
    pub use_dilation: bool,
    pub score_mode: ScoreMode,
    pub min_mini_box_size: usize,
    pub dilation_kernel: Option<Array2<usize>>,
}

Preprocessing Fields

limit_side_len

usize

default:"736"

Target side length for detection model input. The input image is resized according to limit_type.

limit_type

LimitType

default:"LimitType::Min"

How to apply limit_side_len:

LimitType::Min: Ensure shortest side ≥ limit_side_len
LimitType::Max: Ensure longest side ≤ limit_side_len

mean

Array1<f32>

default:"[0.5, 0.5, 0.5]"

Mean values for image normalization (one per channel: RGB).

std

Array1<f32>

default:"[0.5, 0.5, 0.5]"

Standard deviation values for image normalization.

scale

f32

default:"1.0 / 255.0"

Scale factor applied to pixel values before normalization.

Postprocessing Fields

threch

f32

default:"0.3"

Threshold for binarizing the probability map output by the detection model. Pixels with scores > threch are considered text pixels.

box_thresh

f32

default:"0.5"

Minimum average score for a detected text region to be accepted. Higher values reduce false positives but may miss low-confidence text.

max_candidates

usize

default:"1000"

Maximum number of text boxes to output. Limits processing time for images with many text regions.

unclip_ratio

f32

default:"1.6"

Expansion coefficient for the Vatti clipping algorithm. Controls how much detected text regions are expanded. Higher values capture more context around text.

use_dilation

bool

default:"true"

Whether to apply morphological dilation to the binary mask before contour detection. Helps connect broken text regions.

score_mode

ScoreMode

default:"ScoreMode::Fast"

Method for calculating text region scores:

ScoreMode::Fast: Average score within bounding rectangle (faster)
ScoreMode::Slow: Average score within actual polygon (more accurate)

min_mini_box_size

usize

default:"3"

Minimum side length in pixels for detected text boxes. Boxes smaller than this are filtered out.

dilation_kernel

Option<Array2<usize>>

default:"Some([[1, 1], [1, 1]])"

Kernel for morphological dilation (if use_dilation is true). A 2x2 kernel of ones is used by default.

Example

let det_config = DetProcessorConfig {
    limit_side_len: 960,
    threch: 0.35,
    box_thresh: 0.55,
    unclip_ratio: 1.8,  // More padding around text
    ..Default::default()
};

ClsProcessorConfig

Configuration for the text orientation classification stage.

pub struct ClsProcessorConfig {
    pub image_shape: [usize; 3],
    pub batch_num: usize,
    pub thresh: f32,
    pub label: Vec<u16>,
}

Fields

image_shape

[usize; 3]

default:"[3, 48, 192]"

Target shape for classification input images in format [channels, height, width]. Detected text regions are resized to this shape.

batch_num

usize

default:"6"

Batch size for classification inference. Multiple text regions are processed together for efficiency.

thresh

f32

default:"0.9"

Confidence threshold for applying 180° rotation. If the model predicts 180° rotation with score ≥ thresh, the image is rotated before recognition.

label

Vec<u16>

default:"[0, 180]"

Angle values corresponding to model output classes. Index 0 = 0°, index 1 = 180°.

Example

let cls_config = ClsProcessorConfig {
    batch_num: 8,     // Process more images per batch
    thresh: 0.85,     // Lower threshold = more rotations applied
    ..Default::default()
};

RecProcessorConfig

Configuration for the text recognition stage.

pub struct RecProcessorConfig {
    pub character_source: RecCharacterDictProvider,
    pub image_shape: [usize; 3],
    pub batch_num: usize,
}

Fields

character_source

RecCharacterDictProvider

Source of the character dictionary used for decoding model outputs. Options:

RecCharacterDictProvider::OutSide(RettoWorkerModelSource): Load from external file or blob
RecCharacterDictProvider::Inline(): Load from model metadata (not yet implemented)

The default depends on enabled features:

With hf-hub: Downloads ppocr_keys_v1.txt from HuggingFace
Without hf-hub (native): Loads from local file path
WebAssembly: Uses embedded blob

image_shape

[usize; 3]

default:"[3, 48, 320]"

Target shape for recognition input images in format [channels, height, width]. Text crops are resized to this height, with width adjusted based on aspect ratio.

batch_num

usize

default:"6"

Batch size for recognition inference. Text regions are batched by similar aspect ratios for efficiency.

Example

use retto_core::prelude::*;

// Using a custom character dictionary
let rec_config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Path("custom_dict.txt".to_string())
    ),
    image_shape: [3, 48, 320],
    batch_num: 10,
};

Character Dictionary Format

The character dictionary should be a plain text file with one character per line:

0
1
2
...
A
B
C
...
你
好
世
界
...

The decoder automatically adds special tokens:

Index 0: "blank" (CTC blank token)
Last index: " " (space character)

Default Configuration

All configuration types implement Default, optimized for general-purpose OCR:

let config = RettoSessionConfig::<RettoOrtWorker>::default();

This provides:

Detection: 736px model input, 0.3/0.5 thresholds, 1.6x expansion
Classification: 48x192px input, 0.9 rotation threshold, batch size 6
Recognition: 48x320px input, PaddleOCR dictionary, batch size 6
Image resizing: Max 2000px, min 30px

Performance Tuning

For Speed

let fast_config = RettoSessionConfig {
    max_side_len: 1280,  // Smaller input
    det_processor_config: DetProcessorConfig {
        limit_side_len: 640,
        max_candidates: 100,
        ..Default::default()
    },
    cls_processor_config: ClsProcessorConfig {
        batch_num: 16,  // Larger batches
        ..Default::default()
    },
    rec_processor_config: RecProcessorConfig {
        batch_num: 16,
        ..Default::default()
    },
    ..Default::default()
};

For Accuracy

let accurate_config = RettoSessionConfig {
    max_side_len: 3840,  // Support 4K images
    det_processor_config: DetProcessorConfig {
        limit_side_len: 960,
        threch: 0.2,         // Lower threshold = detect more
        box_thresh: 0.4,     // Accept lower confidence
        unclip_ratio: 2.0,   // More context
        score_mode: ScoreMode::Slow,  // More accurate scoring
        ..Default::default()
    },
    ..Default::default()
};

Core API

Processors

Workers

CLI

WebAssembly

Configuration

RettoSessionConfig

Fields

Example

DetProcessorConfig

Preprocessing Fields

Postprocessing Fields

Example

ClsProcessorConfig

Fields

Example

RecProcessorConfig

Fields

Example

Character Dictionary Format

Default Configuration

Performance Tuning

For Speed

For Accuracy

Build docs developers (and LLMs) love

Core API

Processors

Workers

CLI

WebAssembly

​RettoSessionConfig

​Fields

​Example

​DetProcessorConfig

​Preprocessing Fields

​Postprocessing Fields

​Example

​ClsProcessorConfig

​Fields

​Example

​RecProcessorConfig

​Fields

​Example

​Character Dictionary Format

​Default Configuration

​Performance Tuning

​For Speed

​For Accuracy

Build docs developers (and LLMs) love

RettoSessionConfig

Fields

Example

DetProcessorConfig

Preprocessing Fields

Postprocessing Fields

Example

ClsProcessorConfig

Fields

Example

RecProcessorConfig

Fields

Example

Character Dictionary Format

Default Configuration

Performance Tuning

For Speed

For Accuracy