Skip to main content

RettoSessionConfig

Main configuration struct for RettoSession. Controls image preprocessing and all three OCR pipeline stages.
pub struct RettoSessionConfig<W: RettoWorker> {
    pub worker_config: W::RettoWorkerConfig,
    pub max_side_len: usize,
    pub min_side_len: usize,
    pub det_processor_config: DetProcessorConfig,
    pub cls_processor_config: ClsProcessorConfig,
    pub rec_processor_config: RecProcessorConfig,
}

Fields

worker_config
W::RettoWorkerConfig
Backend-specific configuration (e.g., RettoOrtWorkerConfig for ONNX Runtime). Controls model loading, execution providers, and inference settings.
max_side_len
usize
default:"2000"
Maximum length of the longest side when resizing input images. Images larger than this will be scaled down while maintaining aspect ratio. This affects processing speed and memory usage.
min_side_len
usize
default:"30"
Minimum length of the shortest side when resizing. Images smaller than this will be scaled up. Prevents extremely small images from being processed incorrectly.
det_processor_config
DetProcessorConfig
Configuration for the text detection stage. See DetProcessorConfig below.
cls_processor_config
ClsProcessorConfig
Configuration for the text orientation classification stage. See ClsProcessorConfig below.
rec_processor_config
RecProcessorConfig
Configuration for the text recognition stage. See RecProcessorConfig below.

Example

use retto_core::prelude::*;

let config = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    max_side_len: 1920,  // Full HD width
    min_side_len: 50,    // Minimum readable size
    det_processor_config: DetProcessorConfig {
        box_thresh: 0.6,  // Higher threshold = fewer false positives
        ..Default::default()
    },
    cls_processor_config: ClsProcessorConfig::default(),
    rec_processor_config: RecProcessorConfig::default(),
};

let session = RettoSession::new(config)?;

DetProcessorConfig

Configuration for the DB (Differentiable Binarization) text detection algorithm.
pub struct DetProcessorConfig {
    pub limit_side_len: usize,
    pub limit_type: LimitType,
    pub mean: Array1<f32>,
    pub std: Array1<f32>,
    pub scale: f32,
    pub threch: f32,
    pub box_thresh: f32,
    pub max_candidates: usize,
    pub unclip_ratio: f32,
    pub use_dilation: bool,
    pub score_mode: ScoreMode,
    pub min_mini_box_size: usize,
    pub dilation_kernel: Option<Array2<usize>>,
}

Preprocessing Fields

limit_side_len
usize
default:"736"
Target side length for detection model input. The input image is resized according to limit_type.
limit_type
LimitType
default:"LimitType::Min"
How to apply limit_side_len:
  • LimitType::Min: Ensure shortest side ≥ limit_side_len
  • LimitType::Max: Ensure longest side ≤ limit_side_len
mean
Array1<f32>
default:"[0.5, 0.5, 0.5]"
Mean values for image normalization (one per channel: RGB).
std
Array1<f32>
default:"[0.5, 0.5, 0.5]"
Standard deviation values for image normalization.
scale
f32
default:"1.0 / 255.0"
Scale factor applied to pixel values before normalization.

Postprocessing Fields

threch
f32
default:"0.3"
Threshold for binarizing the probability map output by the detection model. Pixels with scores > threch are considered text pixels.
box_thresh
f32
default:"0.5"
Minimum average score for a detected text region to be accepted. Higher values reduce false positives but may miss low-confidence text.
max_candidates
usize
default:"1000"
Maximum number of text boxes to output. Limits processing time for images with many text regions.
unclip_ratio
f32
default:"1.6"
Expansion coefficient for the Vatti clipping algorithm. Controls how much detected text regions are expanded. Higher values capture more context around text.
use_dilation
bool
default:"true"
Whether to apply morphological dilation to the binary mask before contour detection. Helps connect broken text regions.
score_mode
ScoreMode
default:"ScoreMode::Fast"
Method for calculating text region scores:
  • ScoreMode::Fast: Average score within bounding rectangle (faster)
  • ScoreMode::Slow: Average score within actual polygon (more accurate)
min_mini_box_size
usize
default:"3"
Minimum side length in pixels for detected text boxes. Boxes smaller than this are filtered out.
dilation_kernel
Option<Array2<usize>>
default:"Some([[1, 1], [1, 1]])"
Kernel for morphological dilation (if use_dilation is true). A 2x2 kernel of ones is used by default.

Example

let det_config = DetProcessorConfig {
    limit_side_len: 960,
    threch: 0.35,
    box_thresh: 0.55,
    unclip_ratio: 1.8,  // More padding around text
    ..Default::default()
};

ClsProcessorConfig

Configuration for the text orientation classification stage.
pub struct ClsProcessorConfig {
    pub image_shape: [usize; 3],
    pub batch_num: usize,
    pub thresh: f32,
    pub label: Vec<u16>,
}

Fields

image_shape
[usize; 3]
default:"[3, 48, 192]"
Target shape for classification input images in format [channels, height, width]. Detected text regions are resized to this shape.
batch_num
usize
default:"6"
Batch size for classification inference. Multiple text regions are processed together for efficiency.
thresh
f32
default:"0.9"
Confidence threshold for applying 180° rotation. If the model predicts 180° rotation with score ≥ thresh, the image is rotated before recognition.
label
Vec<u16>
default:"[0, 180]"
Angle values corresponding to model output classes. Index 0 = 0°, index 1 = 180°.

Example

let cls_config = ClsProcessorConfig {
    batch_num: 8,     // Process more images per batch
    thresh: 0.85,     // Lower threshold = more rotations applied
    ..Default::default()
};

RecProcessorConfig

Configuration for the text recognition stage.
pub struct RecProcessorConfig {
    pub character_source: RecCharacterDictProvider,
    pub image_shape: [usize; 3],
    pub batch_num: usize,
}

Fields

character_source
RecCharacterDictProvider
Source of the character dictionary used for decoding model outputs. Options:
  • RecCharacterDictProvider::OutSide(RettoWorkerModelSource): Load from external file or blob
  • RecCharacterDictProvider::Inline(): Load from model metadata (not yet implemented)
The default depends on enabled features:
  • With hf-hub: Downloads ppocr_keys_v1.txt from HuggingFace
  • Without hf-hub (native): Loads from local file path
  • WebAssembly: Uses embedded blob
image_shape
[usize; 3]
default:"[3, 48, 320]"
Target shape for recognition input images in format [channels, height, width]. Text crops are resized to this height, with width adjusted based on aspect ratio.
batch_num
usize
default:"6"
Batch size for recognition inference. Text regions are batched by similar aspect ratios for efficiency.

Example

use retto_core::prelude::*;

// Using a custom character dictionary
let rec_config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Path("custom_dict.txt".to_string())
    ),
    image_shape: [3, 48, 320],
    batch_num: 10,
};

Character Dictionary Format

The character dictionary should be a plain text file with one character per line:
0
1
2
...
A
B
C
...




...
The decoder automatically adds special tokens:
  • Index 0: "blank" (CTC blank token)
  • Last index: " " (space character)

Default Configuration

All configuration types implement Default, optimized for general-purpose OCR:
let config = RettoSessionConfig::<RettoOrtWorker>::default();
This provides:
  • Detection: 736px model input, 0.3/0.5 thresholds, 1.6x expansion
  • Classification: 48x192px input, 0.9 rotation threshold, batch size 6
  • Recognition: 48x320px input, PaddleOCR dictionary, batch size 6
  • Image resizing: Max 2000px, min 30px

Performance Tuning

For Speed

let fast_config = RettoSessionConfig {
    max_side_len: 1280,  // Smaller input
    det_processor_config: DetProcessorConfig {
        limit_side_len: 640,
        max_candidates: 100,
        ..Default::default()
    },
    cls_processor_config: ClsProcessorConfig {
        batch_num: 16,  // Larger batches
        ..Default::default()
    },
    rec_processor_config: RecProcessorConfig {
        batch_num: 16,
        ..Default::default()
    },
    ..Default::default()
};

For Accuracy

let accurate_config = RettoSessionConfig {
    max_side_len: 3840,  // Support 4K images
    det_processor_config: DetProcessorConfig {
        limit_side_len: 960,
        threch: 0.2,         // Lower threshold = detect more
        box_thresh: 0.4,     // Accept lower confidence
        unclip_ratio: 2.0,   // More context
        score_mode: ScoreMode::Slow,  // More accurate scoring
        ..Default::default()
    },
    ..Default::default()
};

Build docs developers (and LLMs) love