Overview
The Classification Processor (ClsProcessor) performs orientation angle classification on text images. It determines if text is rotated (0°, 180°, etc.) and can automatically rotate images to the correct orientation.
Source: retto-core/src/processor/cls_processor.rs
ClsProcessor
The main classification processor that determines text orientation angles.Constructor
Classification processor configuration
Process Method
Mutable reference to a vector of cropped images. Images are automatically rotated in-place if classified as 180° with sufficient confidence.
Worker function that runs model inference on preprocessed batches
Classification results containing angle labels and confidence scores for each image
ClsProcessorConfig
Configuration structure for the classification processor.Fields
Prediction scale as [channels, height, width]. Images are resized to this shape for classification.
Batch size for direction classifier predictions. Images are processed in batches of this size for efficiency.
Prediction threshold. If the model predicts a result of 180 degrees and the score is greater than this threshold, the final prediction result is considered to be 180 degrees and the image will be rotated.
The angle values (in degrees) corresponding to each class ID. Index 0 maps to the first angle, index 1 to the second angle, etc.
Example
ClsProcessorResult
Result structure containing classification results for all processed images.Vector of classification results, one per input image in the same order as input
Display Format
ImplementsDisplay trait for easy logging:
ClsProcessorSingleResult
Classification result for a single image.The predicted label containing angle and confidence score
Display Format
ImplementsDisplay trait:
ClsPostProcessLabel
Detailed label information for a classification result.The predicted rotation angle in degrees (e.g., 0, 180, 90, 270). The value comes from the
label array in the configuration.Confidence score for this prediction (0.0 to 1.0). Higher values indicate more confident predictions.
Processing Pipeline
The classification processor follows this pipeline:-
Batch Preparation:
- Sort images by aspect ratio (width/height) in descending order
- Group images into batches of size
batch_num - Images with similar aspect ratios are processed together for efficiency
-
Preprocessing (per batch):
- Resize each image to
image_shapedimensions - Normalize pixel values
- Stack images into a batch tensor (4D array)
- Resize each image to
-
Model Inference:
- Pass preprocessed batch to the classification model via
worker_fun - Model outputs class probabilities for each image
- Pass preprocessed batch to the classification model via
-
Postprocessing:
- For each image in the batch:
- Find the class with maximum probability (argmax)
- Map class ID to angle using the
labelarray - If angle is 180° and score ≥
thresh, rotate the image 180°
- Store results maintaining original input order
- For each image in the batch:
-
Image Rotation:
- Images classified as 180° with confidence ≥
threshare automatically rotated in-place - This ensures downstream processors receive correctly oriented images
- Images classified as 180° with confidence ≥
Angle Classification
The processor supports flexible angle classification:- Binary (default): 0° and 180° (upright vs. upside-down)
- Quaternary: 0°, 90°, 180°, 270° (all four orientations)
- Custom: Any set of angles defined in the
labelarray
Default Behavior (0° and 180°)
Multi-Angle Classification
Example Usage
Integration with Detection
The classification processor is typically used after text detection to correct text orientation:Performance Considerations
batch_num: Larger batches improve throughput but require more memory. Adjust based on your hardware.image_shape: Smaller shapes (e.g., [3, 32, 128]) are faster but may reduce accuracy.thresh: Higher thresholds (0.9-0.95) reduce false rotations but may miss some upside-down text.- Aspect Ratio Sorting: The processor automatically sorts images by aspect ratio to minimize padding waste in batches.
