Classification Processor

Overview

The Classification Processor (ClsProcessor) performs orientation angle classification on text images. It determines if text is rotated (0°, 180°, etc.) and can automatically rotate images to the correct orientation. Source: retto-core/src/processor/cls_processor.rs

ClsProcessor

The main classification processor that determines text orientation angles.

Constructor

pub fn new(config: &ClsProcessorConfig) -> Self

config

&ClsProcessorConfig

required

Classification processor configuration

Process Method

fn process<F>(
    &self,
    crop_images: &mut Vec<ImageHelper>,
    worker_fun: F,
) -> RettoResult<ClsProcessorResult>
where
    F: FnMut(Array4<f32>) -> RettoResult<Array2<f32>>

Processes a batch of cropped images to classify their orientation angles.

crop_images

&mut Vec<ImageHelper>

required

Mutable reference to a vector of cropped images. Images are automatically rotated in-place if classified as 180° with sufficient confidence.

worker_fun

required

Worker function that runs model inference on preprocessed batches

ClsProcessorResult

struct

Classification results containing angle labels and confidence scores for each image

ClsProcessorConfig

Configuration structure for the classification processor.

Fields

image_shape

[usize; 3]

default:"[3, 48, 192]"

Prediction scale as [channels, height, width]. Images are resized to this shape for classification.

batch_num

usize

default:"6"

Batch size for direction classifier predictions. Images are processed in batches of this size for efficiency.

thresh

f32

default:"0.9"

Prediction threshold. If the model predicts a result of 180 degrees and the score is greater than this threshold, the final prediction result is considered to be 180 degrees and the image will be rotated.

label

Vec<u16>

default:"[0, 180]"

The angle values (in degrees) corresponding to each class ID. Index 0 maps to the first angle, index 1 to the second angle, etc.

Example

use retto_core::processor::ClsProcessorConfig;

// Use default configuration (0° and 180° classification)
let config = ClsProcessorConfig::default();

// Custom configuration with different threshold
let custom_config = ClsProcessorConfig {
    image_shape: [3, 48, 192],
    batch_num: 8,
    thresh: 0.85,
    label: vec![0, 180],
};

// Configuration for multi-angle classification
let multi_angle_config = ClsProcessorConfig {
    image_shape: [3, 48, 192],
    batch_num: 6,
    thresh: 0.8,
    label: vec![0, 90, 180, 270],  // Support four orientations
};

ClsProcessorResult

Result structure containing classification results for all processed images.

pub struct ClsProcessorResult(pub Vec<ClsProcessorSingleResult>);

Vec<ClsProcessorSingleResult>

Vector of classification results, one per input image in the same order as input

Display Format

Implements Display trait for easy logging:

println!("{}", result);  // Prints: [ClsProcessorSingleResult { label: ... }, ...]

ClsProcessorSingleResult

Classification result for a single image.

pub struct ClsProcessorSingleResult {
    pub label: ClsPostProcessLabel,
}

label

ClsPostProcessLabel

The predicted label containing angle and confidence score

Display Format

Implements Display trait:

println!("{}", single_result);  // Prints: ClsProcessorSingleResult { label: ... }

ClsPostProcessLabel

Detailed label information for a classification result.

pub struct ClsPostProcessLabel {
    pub label: u16,
    pub score: f32,
}

label

u16

The predicted rotation angle in degrees (e.g., 0, 180, 90, 270). The value comes from the label array in the configuration.

score

f32

Confidence score for this prediction (0.0 to 1.0). Higher values indicate more confident predictions.

Processing Pipeline

The classification processor follows this pipeline:

Batch Preparation:
- Sort images by aspect ratio (width/height) in descending order
- Group images into batches of size batch_num
- Images with similar aspect ratios are processed together for efficiency
Preprocessing (per batch):
- Resize each image to image_shape dimensions
- Normalize pixel values
- Stack images into a batch tensor (4D array)
Model Inference:
- Pass preprocessed batch to the classification model via worker_fun
- Model outputs class probabilities for each image
Postprocessing:
- For each image in the batch:
  - Find the class with maximum probability (argmax)
  - Map class ID to angle using the label array
  - If angle is 180° and score ≥ thresh, rotate the image 180°
- Store results maintaining original input order
Image Rotation:
- Images classified as 180° with confidence ≥ thresh are automatically rotated in-place
- This ensures downstream processors receive correctly oriented images

Angle Classification

The processor supports flexible angle classification:

Binary (default): 0° and 180° (upright vs. upside-down)
Quaternary: 0°, 90°, 180°, 270° (all four orientations)
Custom: Any set of angles defined in the label array

The model output should have as many classes as there are labels in the configuration.

Default Behavior (0° and 180°)

// Default configuration
let config = ClsProcessorConfig {
    label: vec![0, 180],
    thresh: 0.9,
    ..Default::default()
};

// Model outputs 2 classes:
// - Class 0 → 0° (upright)
// - Class 1 → 180° (upside-down)

// If class 1 score ≥ 0.9, image is rotated 180°

Multi-Angle Classification

// Four-angle configuration
let config = ClsProcessorConfig {
    label: vec![0, 90, 180, 270],
    thresh: 0.8,
    ..Default::default()
};

// Model outputs 4 classes:
// - Class 0 → 0° (upright)
// - Class 1 → 90° (rotated right)
// - Class 2 → 180° (upside-down)
// - Class 3 → 270° (rotated left)

// Note: Currently only 180° rotation is automatically applied

Example Usage

use retto_core::processor::{ClsProcessor, ClsProcessorConfig};
use retto_core::image_helper::ImageHelper;

// Create configuration
let config = ClsProcessorConfig::default();

// Create processor
let processor = ClsProcessor::new(&config);

// Prepare cropped images (e.g., from detection results)
let mut crop_images: Vec<ImageHelper> = vec![/* ... */];

// Process images with model inference function
let results = processor.process(&mut crop_images, |batch| {
    // Run your classification model inference here
    model.run(batch)
})?;

// Access classification results
for (i, result) in results.0.iter().enumerate() {
    println!("Image {}: angle = {}°, confidence = {:.2}",
        i,
        result.label.label,
        result.label.score
    );
    
    if result.label.label == 180 && result.label.score >= config.thresh {
        println!("  → Image was rotated 180°");
    }
}

// Images in crop_images are now correctly oriented

Integration with Detection

The classification processor is typically used after text detection to correct text orientation:

use retto_core::processor::{DetProcessor, ClsProcessor};

// 1. Detect text regions
let det_results = det_processor.process(image, model_det)?;

// 2. Crop detected regions
let mut crop_images: Vec<ImageHelper> = det_results.0
    .iter()
    .map(|det| crop_region(image, &det.boxes))
    .collect();

// 3. Classify and correct orientation
let cls_results = cls_processor.process(&mut crop_images, model_cls)?;

// 4. Use correctly oriented images for recognition
// crop_images now contains oriented text ready for OCR

Performance Considerations

batch_num: Larger batches improve throughput but require more memory. Adjust based on your hardware.
image_shape: Smaller shapes (e.g., [3, 32, 128]) are faster but may reduce accuracy.
thresh: Higher thresholds (0.9-0.95) reduce false rotations but may miss some upside-down text.
Aspect Ratio Sorting: The processor automatically sorts images by aspect ratio to minimize padding waste in batches.

Common Use Cases

Document Scanning

// High confidence threshold to avoid incorrect rotations
let config = ClsProcessorConfig {
    thresh: 0.95,
    ..Default::default()
};

General OCR Pipeline

// Balanced threshold for general use
let config = ClsProcessorConfig {
    thresh: 0.9,
    batch_num: 8,
    ..Default::default()
};

Low-Quality Images

// Lower threshold to handle uncertain cases
let config = ClsProcessorConfig {
    thresh: 0.8,
    ..Default::default()
};

Core API

Processors

Workers

CLI

WebAssembly

Classification Processor

Overview

ClsProcessor

Constructor

Process Method

ClsProcessorConfig

Fields

Example

ClsProcessorResult

Display Format

ClsProcessorSingleResult

Display Format

ClsPostProcessLabel

Processing Pipeline

Angle Classification

Default Behavior (0° and 180°)

Multi-Angle Classification

Example Usage

Integration with Detection

Performance Considerations

Common Use Cases

Document Scanning

General OCR Pipeline

Low-Quality Images

Build docs developers (and LLMs) love

Core API

Processors

Workers

CLI

WebAssembly

​Overview

​ClsProcessor

​Constructor

​Process Method

​ClsProcessorConfig

​Fields

​Example

​ClsProcessorResult

​Display Format

​ClsProcessorSingleResult

​Display Format

​ClsPostProcessLabel

​Processing Pipeline

​Angle Classification

​Default Behavior (0° and 180°)

​Multi-Angle Classification

​Example Usage

​Integration with Detection

​Performance Considerations

​Common Use Cases

​Document Scanning

​General OCR Pipeline

​Low-Quality Images

Build docs developers (and LLMs) love

Overview

ClsProcessor

Constructor

Process Method

ClsProcessorConfig

Fields

Example

ClsProcessorResult

Display Format

ClsProcessorSingleResult

Display Format

ClsPostProcessLabel

Processing Pipeline

Angle Classification

Default Behavior (0° and 180°)

Multi-Angle Classification

Example Usage

Integration with Detection

Performance Considerations

Common Use Cases

Document Scanning

General OCR Pipeline

Low-Quality Images