Overview
The Detection Processor (DetProcessor) implements the DB (Differentiable Binarization) algorithm for text detection in images. It identifies text regions and returns their bounding boxes with confidence scores.
Source: retto-core/src/processor/det_processor.rs
DetProcessor
The main detection processor structure that handles text region detection.Constructor
Detection processor configuration
Original image height after initial resize
Original image width after initial resize
Process Method
Input image as a 3D array (height × width × channels) in RGB format
Worker function that runs model inference on preprocessed data
Detection results containing bounding boxes and scores
DetProcessorConfig
Configuration structure for the detection processor implementing the DB algorithm.Fields
Preprocessing
Limit side length of input image. Used to resize the input image before processing.
Input image side length restriction type. Controls how
limit_side_len is applied.Channel-wise mean values for image normalization (RGB channels).
Channel-wise standard deviation values for image normalization (RGB channels).
Initial scale factor applied to pixel values before normalization.
Postprocessing
In the probability map output by DB, only pixels with scores greater than this threshold are considered to be text pixels. Lower values detect more regions but may include false positives.
If the average score of all pixels within the border of the measurement result is greater than this threshold, the result is considered to be a text area. Higher values require more confident detections.
Maximum number of text boxes to output. Limits the number of detected regions.
Expansion coefficient for the Vatti clipping algorithm. This method expands the detected text area to ensure complete text coverage. Values > 1.0 expand the region.
Whether to expand the segmentation results using morphological dilation. Helps connect nearby text regions.
DB detection result scoring method. Determines how confidence scores are calculated.
Minimum side length threshold for text boxes. Boxes smaller than this are filtered out.
Morphological dilation kernel. Used when
use_dilation is true. A 2×2 kernel of ones by default.Example
DetProcessorResult
Result structure containing all detected text regions.Vector of individual detection results, sorted by position (top-to-bottom, left-to-right)
DetProcessorInnerResult
Individual detection result for a single text region.Bounding box of the detected text region as a quadrilateral. Points are ordered clockwise starting from the top-left corner.
Confidence score for this detection (0.0 to 1.0). Higher values indicate more confident detections.
PointBox Structure
A rectangular point frame representing the detected text region.Top-left corner of the bounding box
Top-right corner of the bounding box
Bottom-right corner of the bounding box
Bottom-left corner of the bounding box
All four corner points as an array (clockwise from top-left)
Center point of the bounding box
Width of the bounding box calculated from top-left corner
Height of the bounding box calculated from top-left corner
LimitType
Enum defining how thelimit_side_len parameter is applied during preprocessing.
Ensure that the shortest side of the image is not less than
limit_side_len. Use this to guarantee minimum resolution.Ensure that the longest side of the image does not exceed
limit_side_len. Use this to limit maximum processing size.ScoreMode
Enum defining the scoring method for detection results.Calculate the average score for all pixels within the bounding rectangle of the polygon. This is faster but less accurate as it includes pixels outside the actual text region.
Calculate the average score based on all pixels within the original polygon only. This method is relatively slow but more accurate as it only considers actual text pixels.
Processing Pipeline
The detection processor follows this pipeline:-
Preprocessing:
- Resize input image according to
limit_typeandlimit_side_len - Convert RGB to BGR color space
- Normalize pixel values:
(pixel * scale - mean) / std - Permute dimensions from HWC to CHW format
- Add batch dimension
- Resize input image according to
-
Model Inference:
- Pass preprocessed data to the DB model via
worker_fun - Model outputs probability map for text regions
- Pass preprocessed data to the DB model via
-
Postprocessing:
- Threshold probability map using
threchto create binary mask - Apply morphological dilation if
use_dilationis enabled - Find contours in the binary mask
- For each contour:
- Get minimum area bounding box
- Filter by
min_mini_box_size - Calculate confidence score using
score_mode - Filter by
box_thresh - Expand region using Vatti clipping with
unclip_ratio - Get final bounding box and scale to original image coordinates
- Sort results by position (top-to-bottom, left-to-right)
- Limit to
max_candidatesresults
- Threshold probability map using
Example Usage
Performance Considerations
limit_side_len: Smaller values process faster but may miss small text. Larger values are more accurate but slower.score_mode:Fastis recommended for most cases. UseSlowonly when accuracy is critical.use_dilation: Disabling dilation improves performance but may separate connected text regions.unclip_ratio: Lower values (1.2-1.5) are faster but may clip text edges. Higher values (1.6-2.0) ensure full text coverage.
