Skip to main content

Overview

Retto returns structured results from each stage of the OCR pipeline. The primary result types are:
  • RettoWorkerResult: Complete results from all three stages
  • RettoWorkerStageResult: Individual stage results (for streaming)
  • DetProcessorResult: Text detection results
  • ClsProcessorResult: Text orientation classification results
  • RecProcessorResult: Text recognition results

RettoWorkerResult

Complete OCR results containing outputs from all three pipeline stages.
pub struct RettoWorkerResult {
    pub det_result: DetProcessorResult,
    pub cls_result: ClsProcessorResult,
    pub rec_result: RecProcessorResult,
}
Returned by RettoSession::run().

Fields

det_result
DetProcessorResult
Text detection results: bounding boxes and confidence scores for all detected text regions.
cls_result
ClsProcessorResult
Text orientation classification results: predicted rotation angles (0° or 180°) for each detected region.
rec_result
RecProcessorResult
Text recognition results: extracted text content and confidence scores for each region.

Example Usage

let result = session.run(image_data)?;

// Results are aligned by index
for i in 0..result.det_result.0.len() {
    let bbox = &result.det_result.0[i];
    let rotation = &result.cls_result.0[i];
    let text = &result.rec_result.0[i];
    
    println!("Region {}:", i);
    println!("  Position: {:?}", bbox.boxes);
    println!("  Detection confidence: {:.2}%", bbox.score * 100.0);
    println!("  Rotation: {}°", rotation.label.label);
    println!("  Text: '{}'", text.text);
    println!("  Recognition confidence: {:.2}%", text.score * 100.0);
}

RettoWorkerStageResult

Enum representing results from individual pipeline stages. Used with streaming API.
pub enum RettoWorkerStageResult {
    Det(DetProcessorResult),
    Cls(ClsProcessorResult),
    Rec(RecProcessorResult),
}
Returned via channel by RettoSession::run_stream().

Variants

Det
DetProcessorResult
Detection stage has completed.
Cls
ClsProcessorResult
Classification stage has completed.
Rec
RecProcessorResult
Recognition stage has completed (final stage).

Example Usage

use std::sync::mpsc;

let (tx, rx) = mpsc::channel();
session.run_stream(image_data, tx)?;

for stage_result in rx {
    match stage_result {
        RettoWorkerStageResult::Det(det) => {
            println!("Detected {} text regions", det.0.len());
            for (i, region) in det.0.iter().enumerate() {
                println!("  Region {}: score {:.2}", i, region.score);
            }
        }
        RettoWorkerStageResult::Cls(cls) => {
            println!("Classified {} regions", cls.0.len());
        }
        RettoWorkerStageResult::Rec(rec) => {
            println!("Recognized {} texts", rec.0.len());
            for (i, text_result) in rec.0.iter().enumerate() {
                println!("  Text {}: '{}'", i, text_result.text);
            }
        }
    }
}

DetProcessorResult

Results from the text detection stage.
pub struct DetProcessorResult(pub Vec<DetProcessorInnerResult>);

pub struct DetProcessorInnerResult {
    pub boxes: PointBox<OrderedFloat<f32>>,
    pub score: f32,
}

Fields

boxes
PointBox<OrderedFloat<f32>>
Quadrilateral bounding box for the detected text region. Represented as four corner points in clockwise order: top-left, top-right, bottom-right, bottom-left.Coordinates are in the original image space (not the resized input to the detection model).
score
f32
Detection confidence score between 0.0 and 1.0. Higher values indicate higher confidence that the region contains text.

PointBox Methods

The PointBox type provides utility methods:
// Access corner points
let tl = boxes.tl();  // Top-left
let tr = boxes.tr();  // Top-right
let br = boxes.br();  // Bottom-right
let bl = boxes.bl();  // Bottom-left

// Get dimensions
let width = boxes.width_tlc();   // Width at top edge
let height = boxes.height_tlc(); // Height at left edge
let center = boxes.center_point(); // Center point

Example Usage

let det_result = result.det_result;

println!("Found {} text regions", det_result.0.len());

for (i, region) in det_result.0.iter().enumerate() {
    println!("Region {}:", i);
    println!("  Top-left: ({}, {})", region.boxes.tl().x, region.boxes.tl().y);
    println!("  Top-right: ({}, {})", region.boxes.tr().x, region.boxes.tr().y);
    println!("  Bottom-right: ({}, {})", region.boxes.br().x, region.boxes.br().y);
    println!("  Bottom-left: ({}, {})", region.boxes.bl().x, region.boxes.bl().y);
    println!("  Confidence: {:.2}%", region.score * 100.0);
}

Sorting Behavior

Detected regions are automatically sorted by position:
  1. Primary: Top to bottom (Y-coordinate)
  2. Secondary: Left to right (X-coordinate) for regions with similar Y values
Regions are considered at the same vertical position if their Y-coordinates differ by less than 10 pixels.

ClsProcessorResult

Results from the text orientation classification stage.
pub struct ClsProcessorResult(pub Vec<ClsProcessorSingleResult>);

pub struct ClsProcessorSingleResult {
    pub label: ClsPostProcessLabel,
}

pub struct ClsPostProcessLabel {
    pub label: u16,
    pub score: f32,
}

Fields

label
u16
Predicted rotation angle in degrees. Typically either 0 (no rotation) or 180 (upside-down).
score
f32
Classification confidence score between 0.0 and 1.0. Higher values indicate higher confidence in the predicted rotation.

Automatic Rotation

Images are automatically rotated 180° before recognition if:
  • label == 180
  • score >= cls_processor_config.thresh (default 0.9)
This rotation is applied to the cropped text region internally and does not affect the coordinates returned in DetProcessorResult.

Example Usage

let cls_result = result.cls_result;

for (i, cls) in cls_result.0.iter().enumerate() {
    println!("Region {}: rotation {}°, confidence {:.2}%",
        i, cls.label.label, cls.label.score * 100.0);
    
    if cls.label.label == 180 {
        println!("  → Text is upside-down");
    }
}

RecProcessorResult

Results from the text recognition stage.
pub struct RecProcessorResult(pub Vec<RecProcessorSingleResult>);

pub struct RecProcessorSingleResult {
    pub text: String,
    pub score: f32,
}

Fields

text
String
Recognized text content as a UTF-8 string. Decoded from the model’s output using the character dictionary.
score
f32
Average recognition confidence score between 0.0 and 1.0. Calculated as the mean probability of all predicted characters.

Character Decoding

The recognition model outputs character probabilities for each position. The decoder:
  1. Selects the highest probability character at each position
  2. Removes consecutive duplicates (CTC blank collapsing)
  3. Filters out blank tokens and other ignored tokens
  4. Concatenates characters into the final string
The confidence score is the average of all character probabilities before filtering.

Example Usage

let rec_result = result.rec_result;

for (i, text_result) in rec_result.0.iter().enumerate() {
    println!("Region {}: '{}' (confidence: {:.2}%)",
        i, text_result.text, text_result.score * 100.0);
    
    // Filter low-confidence results
    if text_result.score < 0.5 {
        println!("  ⚠ Low confidence - may be inaccurate");
    }
}

Complete Example

Processing all results together:
use retto_core::prelude::*;

let session = RettoSession::new(RettoSessionConfig::default())?;
let image_data = std::fs::read("document.jpg")?;
let result = session.run(image_data)?;

// Results are aligned by index across all stages
assert_eq!(result.det_result.0.len(), result.cls_result.0.len());
assert_eq!(result.det_result.0.len(), result.rec_result.0.len());

for i in 0..result.det_result.0.len() {
    let det = &result.det_result.0[i];
    let cls = &result.cls_result.0[i];
    let rec = &result.rec_result.0[i];
    
    // Skip low-confidence detections
    if det.score < 0.5 || rec.score < 0.5 {
        continue;
    }
    
    println!("Text region {}:", i);
    println!("  Location: ({:.0}, {:.0}) to ({:.0}, {:.0})",
        det.boxes.tl().x, det.boxes.tl().y,
        det.boxes.br().x, det.boxes.br().y);
    println!("  Orientation: {}°", cls.label.label);
    println!("  Content: '{}'", rec.text);
    println!("  Confidence: det={:.1}%, rec={:.1}%",
        det.score * 100.0, rec.score * 100.0);
    println!();
}

Serialization

All result types support serialization with the serde feature:
[dependencies]
retto-core = { version = "*", features = ["serde"] }
use serde_json;

let result = session.run(image_data)?;
let json = serde_json::to_string_pretty(&result)?;
std::fs::write("results.json", json)?;
This is useful for:
  • Saving results for later processing
  • Sending results over a network
  • Creating datasets for evaluation
  • Debugging and logging

Build docs developers (and LLMs) love