RettoSessionConfig
Main configuration struct forRettoSession. Controls image preprocessing and all three OCR pipeline stages.
Fields
Backend-specific configuration (e.g.,
RettoOrtWorkerConfig for ONNX Runtime). Controls model loading, execution providers, and inference settings.Maximum length of the longest side when resizing input images. Images larger than this will be scaled down while maintaining aspect ratio. This affects processing speed and memory usage.
Minimum length of the shortest side when resizing. Images smaller than this will be scaled up. Prevents extremely small images from being processed incorrectly.
Configuration for the text detection stage. See DetProcessorConfig below.
Configuration for the text orientation classification stage. See ClsProcessorConfig below.
Configuration for the text recognition stage. See RecProcessorConfig below.
Example
DetProcessorConfig
Configuration for the DB (Differentiable Binarization) text detection algorithm.Preprocessing Fields
Target side length for detection model input. The input image is resized according to
limit_type.How to apply
limit_side_len:LimitType::Min: Ensure shortest side ≥limit_side_lenLimitType::Max: Ensure longest side ≤limit_side_len
Mean values for image normalization (one per channel: RGB).
Standard deviation values for image normalization.
Scale factor applied to pixel values before normalization.
Postprocessing Fields
Threshold for binarizing the probability map output by the detection model. Pixels with scores >
threch are considered text pixels.Minimum average score for a detected text region to be accepted. Higher values reduce false positives but may miss low-confidence text.
Maximum number of text boxes to output. Limits processing time for images with many text regions.
Expansion coefficient for the Vatti clipping algorithm. Controls how much detected text regions are expanded. Higher values capture more context around text.
Whether to apply morphological dilation to the binary mask before contour detection. Helps connect broken text regions.
Method for calculating text region scores:
ScoreMode::Fast: Average score within bounding rectangle (faster)ScoreMode::Slow: Average score within actual polygon (more accurate)
Minimum side length in pixels for detected text boxes. Boxes smaller than this are filtered out.
Kernel for morphological dilation (if
use_dilation is true). A 2x2 kernel of ones is used by default.Example
ClsProcessorConfig
Configuration for the text orientation classification stage.Fields
Target shape for classification input images in format
[channels, height, width]. Detected text regions are resized to this shape.Batch size for classification inference. Multiple text regions are processed together for efficiency.
Confidence threshold for applying 180° rotation. If the model predicts 180° rotation with score ≥
thresh, the image is rotated before recognition.Angle values corresponding to model output classes. Index 0 = 0°, index 1 = 180°.
Example
RecProcessorConfig
Configuration for the text recognition stage.Fields
Source of the character dictionary used for decoding model outputs. Options:
RecCharacterDictProvider::OutSide(RettoWorkerModelSource): Load from external file or blobRecCharacterDictProvider::Inline(): Load from model metadata (not yet implemented)
- With
hf-hub: Downloadsppocr_keys_v1.txtfrom HuggingFace - Without
hf-hub(native): Loads from local file path - WebAssembly: Uses embedded blob
Target shape for recognition input images in format
[channels, height, width]. Text crops are resized to this height, with width adjusted based on aspect ratio.Batch size for recognition inference. Text regions are batched by similar aspect ratios for efficiency.
Example
Character Dictionary Format
The character dictionary should be a plain text file with one character per line:- Index 0:
"blank"(CTC blank token) - Last index:
" "(space character)
Default Configuration
All configuration types implementDefault, optimized for general-purpose OCR:
- Detection: 736px model input, 0.3/0.5 thresholds, 1.6x expansion
- Classification: 48x192px input, 0.9 rotation threshold, batch size 6
- Recognition: 48x320px input, PaddleOCR dictionary, batch size 6
- Image resizing: Max 2000px, min 30px
