Results

Overview

Retto returns structured results from each stage of the OCR pipeline. The primary result types are:

RettoWorkerResult: Complete results from all three stages
RettoWorkerStageResult: Individual stage results (for streaming)
DetProcessorResult: Text detection results
ClsProcessorResult: Text orientation classification results
RecProcessorResult: Text recognition results

RettoWorkerResult

Complete OCR results containing outputs from all three pipeline stages.

pub struct RettoWorkerResult {
    pub det_result: DetProcessorResult,
    pub cls_result: ClsProcessorResult,
    pub rec_result: RecProcessorResult,
}

Returned by RettoSession::run().

Fields

det_result

DetProcessorResult

Text detection results: bounding boxes and confidence scores for all detected text regions.

cls_result

ClsProcessorResult

Text orientation classification results: predicted rotation angles (0° or 180°) for each detected region.

rec_result

RecProcessorResult

Text recognition results: extracted text content and confidence scores for each region.

Example Usage

let result = session.run(image_data)?;

// Results are aligned by index
for i in 0..result.det_result.0.len() {
    let bbox = &result.det_result.0[i];
    let rotation = &result.cls_result.0[i];
    let text = &result.rec_result.0[i];
    
    println!("Region {}:", i);
    println!("  Position: {:?}", bbox.boxes);
    println!("  Detection confidence: {:.2}%", bbox.score * 100.0);
    println!("  Rotation: {}°", rotation.label.label);
    println!("  Text: '{}'", text.text);
    println!("  Recognition confidence: {:.2}%", text.score * 100.0);
}

RettoWorkerStageResult

Enum representing results from individual pipeline stages. Used with streaming API.

pub enum RettoWorkerStageResult {
    Det(DetProcessorResult),
    Cls(ClsProcessorResult),
    Rec(RecProcessorResult),
}

Returned via channel by RettoSession::run_stream().

Variants

Det

DetProcessorResult

Detection stage has completed.

Cls

ClsProcessorResult

Classification stage has completed.

Rec

RecProcessorResult

Recognition stage has completed (final stage).

Example Usage

use std::sync::mpsc;

let (tx, rx) = mpsc::channel();
session.run_stream(image_data, tx)?;

for stage_result in rx {
    match stage_result {
        RettoWorkerStageResult::Det(det) => {
            println!("Detected {} text regions", det.0.len());
            for (i, region) in det.0.iter().enumerate() {
                println!("  Region {}: score {:.2}", i, region.score);
            }
        }
        RettoWorkerStageResult::Cls(cls) => {
            println!("Classified {} regions", cls.0.len());
        }
        RettoWorkerStageResult::Rec(rec) => {
            println!("Recognized {} texts", rec.0.len());
            for (i, text_result) in rec.0.iter().enumerate() {
                println!("  Text {}: '{}'", i, text_result.text);
            }
        }
    }
}

DetProcessorResult

Results from the text detection stage.

pub struct DetProcessorResult(pub Vec<DetProcessorInnerResult>);

pub struct DetProcessorInnerResult {
    pub boxes: PointBox<OrderedFloat<f32>>,
    pub score: f32,
}

Fields

boxes

PointBox<OrderedFloat<f32>>

Quadrilateral bounding box for the detected text region. Represented as four corner points in clockwise order: top-left, top-right, bottom-right, bottom-left.Coordinates are in the original image space (not the resized input to the detection model).

score

f32

Detection confidence score between 0.0 and 1.0. Higher values indicate higher confidence that the region contains text.

PointBox Methods

The PointBox type provides utility methods:

// Access corner points
let tl = boxes.tl();  // Top-left
let tr = boxes.tr();  // Top-right
let br = boxes.br();  // Bottom-right
let bl = boxes.bl();  // Bottom-left

// Get dimensions
let width = boxes.width_tlc();   // Width at top edge
let height = boxes.height_tlc(); // Height at left edge
let center = boxes.center_point(); // Center point

Example Usage

let det_result = result.det_result;

println!("Found {} text regions", det_result.0.len());

for (i, region) in det_result.0.iter().enumerate() {
    println!("Region {}:", i);
    println!("  Top-left: ({}, {})", region.boxes.tl().x, region.boxes.tl().y);
    println!("  Top-right: ({}, {})", region.boxes.tr().x, region.boxes.tr().y);
    println!("  Bottom-right: ({}, {})", region.boxes.br().x, region.boxes.br().y);
    println!("  Bottom-left: ({}, {})", region.boxes.bl().x, region.boxes.bl().y);
    println!("  Confidence: {:.2}%", region.score * 100.0);
}

Sorting Behavior

Detected regions are automatically sorted by position:

Primary: Top to bottom (Y-coordinate)
Secondary: Left to right (X-coordinate) for regions with similar Y values

Regions are considered at the same vertical position if their Y-coordinates differ by less than 10 pixels.

ClsProcessorResult

Results from the text orientation classification stage.

pub struct ClsProcessorResult(pub Vec<ClsProcessorSingleResult>);

pub struct ClsProcessorSingleResult {
    pub label: ClsPostProcessLabel,
}

pub struct ClsPostProcessLabel {
    pub label: u16,
    pub score: f32,
}

Fields

label

u16

Predicted rotation angle in degrees. Typically either 0 (no rotation) or 180 (upside-down).

score

f32

Classification confidence score between 0.0 and 1.0. Higher values indicate higher confidence in the predicted rotation.

Automatic Rotation

Images are automatically rotated 180° before recognition if:

label == 180
score >= cls_processor_config.thresh (default 0.9)

This rotation is applied to the cropped text region internally and does not affect the coordinates returned in DetProcessorResult.

Example Usage

let cls_result = result.cls_result;

for (i, cls) in cls_result.0.iter().enumerate() {
    println!("Region {}: rotation {}°, confidence {:.2}%",
        i, cls.label.label, cls.label.score * 100.0);
    
    if cls.label.label == 180 {
        println!("  → Text is upside-down");
    }
}

RecProcessorResult

Results from the text recognition stage.

pub struct RecProcessorResult(pub Vec<RecProcessorSingleResult>);

pub struct RecProcessorSingleResult {
    pub text: String,
    pub score: f32,
}

Fields

text

String

Recognized text content as a UTF-8 string. Decoded from the model’s output using the character dictionary.

score

f32

Average recognition confidence score between 0.0 and 1.0. Calculated as the mean probability of all predicted characters.

Character Decoding

The recognition model outputs character probabilities for each position. The decoder:

Selects the highest probability character at each position
Removes consecutive duplicates (CTC blank collapsing)
Filters out blank tokens and other ignored tokens
Concatenates characters into the final string

The confidence score is the average of all character probabilities before filtering.

Example Usage

let rec_result = result.rec_result;

for (i, text_result) in rec_result.0.iter().enumerate() {
    println!("Region {}: '{}' (confidence: {:.2}%)",
        i, text_result.text, text_result.score * 100.0);
    
    // Filter low-confidence results
    if text_result.score < 0.5 {
        println!("  ⚠ Low confidence - may be inaccurate");
    }
}

Complete Example

Processing all results together:

use retto_core::prelude::*;

let session = RettoSession::new(RettoSessionConfig::default())?;
let image_data = std::fs::read("document.jpg")?;
let result = session.run(image_data)?;

// Results are aligned by index across all stages
assert_eq!(result.det_result.0.len(), result.cls_result.0.len());
assert_eq!(result.det_result.0.len(), result.rec_result.0.len());

for i in 0..result.det_result.0.len() {
    let det = &result.det_result.0[i];
    let cls = &result.cls_result.0[i];
    let rec = &result.rec_result.0[i];
    
    // Skip low-confidence detections
    if det.score < 0.5 || rec.score < 0.5 {
        continue;
    }
    
    println!("Text region {}:", i);
    println!("  Location: ({:.0}, {:.0}) to ({:.0}, {:.0})",
        det.boxes.tl().x, det.boxes.tl().y,
        det.boxes.br().x, det.boxes.br().y);
    println!("  Orientation: {}°", cls.label.label);
    println!("  Content: '{}'", rec.text);
    println!("  Confidence: det={:.1}%, rec={:.1}%",
        det.score * 100.0, rec.score * 100.0);
    println!();
}

Serialization

All result types support serialization with the serde feature:

[dependencies]
retto-core = { version = "*", features = ["serde"] }

use serde_json;

let result = session.run(image_data)?;
let json = serde_json::to_string_pretty(&result)?;
std::fs::write("results.json", json)?;

This is useful for:

Saving results for later processing
Sending results over a network
Creating datasets for evaluation
Debugging and logging

Core API

Processors

Workers

CLI

WebAssembly

Overview

RettoWorkerResult

Fields

Example Usage

RettoWorkerStageResult

Variants

Example Usage

DetProcessorResult

Fields

PointBox Methods

Example Usage

Sorting Behavior

ClsProcessorResult

Fields

Automatic Rotation

Example Usage

RecProcessorResult

Fields

Character Decoding

Example Usage

Complete Example

Serialization

Build docs developers (and LLMs) love

Core API

Processors

Workers

CLI

WebAssembly

​Overview

​RettoWorkerResult

​Fields

​Example Usage

​RettoWorkerStageResult

​Variants

​Example Usage

​DetProcessorResult

​Fields

​PointBox Methods

​Example Usage

​Sorting Behavior

​ClsProcessorResult

​Fields

​Automatic Rotation

​Example Usage

​RecProcessorResult

​Fields

​Character Decoding

​Example Usage

​Complete Example

​Serialization

Build docs developers (and LLMs) love

Overview

RettoWorkerResult

Fields

Example Usage

RettoWorkerStageResult

Variants

Example Usage

DetProcessorResult

Fields

PointBox Methods

Example Usage

Sorting Behavior

ClsProcessorResult

Fields

Automatic Rotation

Example Usage

RecProcessorResult

Fields

Character Decoding

Example Usage

Complete Example

Serialization