Skip to main content

Overview

The Classification Processor (ClsProcessor) performs orientation angle classification on text images. It determines if text is rotated (0°, 180°, etc.) and can automatically rotate images to the correct orientation. Source: retto-core/src/processor/cls_processor.rs

ClsProcessor

The main classification processor that determines text orientation angles.

Constructor

pub fn new(config: &ClsProcessorConfig) -> Self
config
&ClsProcessorConfig
required
Classification processor configuration

Process Method

fn process<F>(
    &self,
    crop_images: &mut Vec<ImageHelper>,
    worker_fun: F,
) -> RettoResult<ClsProcessorResult>
where
    F: FnMut(Array4<f32>) -> RettoResult<Array2<f32>>
Processes a batch of cropped images to classify their orientation angles.
crop_images
&mut Vec<ImageHelper>
required
Mutable reference to a vector of cropped images. Images are automatically rotated in-place if classified as 180° with sufficient confidence.
worker_fun
F
required
Worker function that runs model inference on preprocessed batches
ClsProcessorResult
struct
Classification results containing angle labels and confidence scores for each image

ClsProcessorConfig

Configuration structure for the classification processor.

Fields

image_shape
[usize; 3]
default:"[3, 48, 192]"
Prediction scale as [channels, height, width]. Images are resized to this shape for classification.
batch_num
usize
default:"6"
Batch size for direction classifier predictions. Images are processed in batches of this size for efficiency.
thresh
f32
default:"0.9"
Prediction threshold. If the model predicts a result of 180 degrees and the score is greater than this threshold, the final prediction result is considered to be 180 degrees and the image will be rotated.
label
Vec<u16>
default:"[0, 180]"
The angle values (in degrees) corresponding to each class ID. Index 0 maps to the first angle, index 1 to the second angle, etc.

Example

use retto_core::processor::ClsProcessorConfig;

// Use default configuration (0° and 180° classification)
let config = ClsProcessorConfig::default();

// Custom configuration with different threshold
let custom_config = ClsProcessorConfig {
    image_shape: [3, 48, 192],
    batch_num: 8,
    thresh: 0.85,
    label: vec![0, 180],
};

// Configuration for multi-angle classification
let multi_angle_config = ClsProcessorConfig {
    image_shape: [3, 48, 192],
    batch_num: 6,
    thresh: 0.8,
    label: vec![0, 90, 180, 270],  // Support four orientations
};

ClsProcessorResult

Result structure containing classification results for all processed images.
pub struct ClsProcessorResult(pub Vec<ClsProcessorSingleResult>);
0
Vec<ClsProcessorSingleResult>
Vector of classification results, one per input image in the same order as input

Display Format

Implements Display trait for easy logging:
println!("{}", result);  // Prints: [ClsProcessorSingleResult { label: ... }, ...]

ClsProcessorSingleResult

Classification result for a single image.
pub struct ClsProcessorSingleResult {
    pub label: ClsPostProcessLabel,
}
label
ClsPostProcessLabel
The predicted label containing angle and confidence score

Display Format

Implements Display trait:
println!("{}", single_result);  // Prints: ClsProcessorSingleResult { label: ... }

ClsPostProcessLabel

Detailed label information for a classification result.
pub struct ClsPostProcessLabel {
    pub label: u16,
    pub score: f32,
}
label
u16
The predicted rotation angle in degrees (e.g., 0, 180, 90, 270). The value comes from the label array in the configuration.
score
f32
Confidence score for this prediction (0.0 to 1.0). Higher values indicate more confident predictions.

Processing Pipeline

The classification processor follows this pipeline:
  1. Batch Preparation:
    • Sort images by aspect ratio (width/height) in descending order
    • Group images into batches of size batch_num
    • Images with similar aspect ratios are processed together for efficiency
  2. Preprocessing (per batch):
    • Resize each image to image_shape dimensions
    • Normalize pixel values
    • Stack images into a batch tensor (4D array)
  3. Model Inference:
    • Pass preprocessed batch to the classification model via worker_fun
    • Model outputs class probabilities for each image
  4. Postprocessing:
    • For each image in the batch:
      • Find the class with maximum probability (argmax)
      • Map class ID to angle using the label array
      • If angle is 180° and score ≥ thresh, rotate the image 180°
    • Store results maintaining original input order
  5. Image Rotation:
    • Images classified as 180° with confidence ≥ thresh are automatically rotated in-place
    • This ensures downstream processors receive correctly oriented images

Angle Classification

The processor supports flexible angle classification:
  • Binary (default): 0° and 180° (upright vs. upside-down)
  • Quaternary: 0°, 90°, 180°, 270° (all four orientations)
  • Custom: Any set of angles defined in the label array
The model output should have as many classes as there are labels in the configuration.

Default Behavior (0° and 180°)

// Default configuration
let config = ClsProcessorConfig {
    label: vec![0, 180],
    thresh: 0.9,
    ..Default::default()
};

// Model outputs 2 classes:
// - Class 0 → 0° (upright)
// - Class 1 → 180° (upside-down)

// If class 1 score ≥ 0.9, image is rotated 180°

Multi-Angle Classification

// Four-angle configuration
let config = ClsProcessorConfig {
    label: vec![0, 90, 180, 270],
    thresh: 0.8,
    ..Default::default()
};

// Model outputs 4 classes:
// - Class 0 → 0° (upright)
// - Class 1 → 90° (rotated right)
// - Class 2 → 180° (upside-down)
// - Class 3 → 270° (rotated left)

// Note: Currently only 180° rotation is automatically applied

Example Usage

use retto_core::processor::{ClsProcessor, ClsProcessorConfig};
use retto_core::image_helper::ImageHelper;

// Create configuration
let config = ClsProcessorConfig::default();

// Create processor
let processor = ClsProcessor::new(&config);

// Prepare cropped images (e.g., from detection results)
let mut crop_images: Vec<ImageHelper> = vec![/* ... */];

// Process images with model inference function
let results = processor.process(&mut crop_images, |batch| {
    // Run your classification model inference here
    model.run(batch)
})?;

// Access classification results
for (i, result) in results.0.iter().enumerate() {
    println!("Image {}: angle = {}°, confidence = {:.2}",
        i,
        result.label.label,
        result.label.score
    );
    
    if result.label.label == 180 && result.label.score >= config.thresh {
        println!("  → Image was rotated 180°");
    }
}

// Images in crop_images are now correctly oriented

Integration with Detection

The classification processor is typically used after text detection to correct text orientation:
use retto_core::processor::{DetProcessor, ClsProcessor};

// 1. Detect text regions
let det_results = det_processor.process(image, model_det)?;

// 2. Crop detected regions
let mut crop_images: Vec<ImageHelper> = det_results.0
    .iter()
    .map(|det| crop_region(image, &det.boxes))
    .collect();

// 3. Classify and correct orientation
let cls_results = cls_processor.process(&mut crop_images, model_cls)?;

// 4. Use correctly oriented images for recognition
// crop_images now contains oriented text ready for OCR

Performance Considerations

  • batch_num: Larger batches improve throughput but require more memory. Adjust based on your hardware.
  • image_shape: Smaller shapes (e.g., [3, 32, 128]) are faster but may reduce accuracy.
  • thresh: Higher thresholds (0.9-0.95) reduce false rotations but may miss some upside-down text.
  • Aspect Ratio Sorting: The processor automatically sorts images by aspect ratio to minimize padding waste in batches.

Common Use Cases

Document Scanning

// High confidence threshold to avoid incorrect rotations
let config = ClsProcessorConfig {
    thresh: 0.95,
    ..Default::default()
};

General OCR Pipeline

// Balanced threshold for general use
let config = ClsProcessorConfig {
    thresh: 0.9,
    batch_num: 8,
    ..Default::default()
};

Low-Quality Images

// Lower threshold to handle uncertain cases
let config = ClsProcessorConfig {
    thresh: 0.8,
    ..Default::default()
};

Build docs developers (and LLMs) love