Overview
Retto returns structured results from each stage of the OCR pipeline. The primary result types are:
- RettoWorkerResult: Complete results from all three stages
- RettoWorkerStageResult: Individual stage results (for streaming)
- DetProcessorResult: Text detection results
- ClsProcessorResult: Text orientation classification results
- RecProcessorResult: Text recognition results
RettoWorkerResult
Complete OCR results containing outputs from all three pipeline stages.
pub struct RettoWorkerResult {
pub det_result: DetProcessorResult,
pub cls_result: ClsProcessorResult,
pub rec_result: RecProcessorResult,
}
Returned by RettoSession::run().
Fields
Text detection results: bounding boxes and confidence scores for all detected text regions.
Text orientation classification results: predicted rotation angles (0° or 180°) for each detected region.
Text recognition results: extracted text content and confidence scores for each region.
Example Usage
let result = session.run(image_data)?;
// Results are aligned by index
for i in 0..result.det_result.0.len() {
let bbox = &result.det_result.0[i];
let rotation = &result.cls_result.0[i];
let text = &result.rec_result.0[i];
println!("Region {}:", i);
println!(" Position: {:?}", bbox.boxes);
println!(" Detection confidence: {:.2}%", bbox.score * 100.0);
println!(" Rotation: {}°", rotation.label.label);
println!(" Text: '{}'", text.text);
println!(" Recognition confidence: {:.2}%", text.score * 100.0);
}
RettoWorkerStageResult
Enum representing results from individual pipeline stages. Used with streaming API.
pub enum RettoWorkerStageResult {
Det(DetProcessorResult),
Cls(ClsProcessorResult),
Rec(RecProcessorResult),
}
Returned via channel by RettoSession::run_stream().
Variants
Detection stage has completed.
Classification stage has completed.
Recognition stage has completed (final stage).
Example Usage
use std::sync::mpsc;
let (tx, rx) = mpsc::channel();
session.run_stream(image_data, tx)?;
for stage_result in rx {
match stage_result {
RettoWorkerStageResult::Det(det) => {
println!("Detected {} text regions", det.0.len());
for (i, region) in det.0.iter().enumerate() {
println!(" Region {}: score {:.2}", i, region.score);
}
}
RettoWorkerStageResult::Cls(cls) => {
println!("Classified {} regions", cls.0.len());
}
RettoWorkerStageResult::Rec(rec) => {
println!("Recognized {} texts", rec.0.len());
for (i, text_result) in rec.0.iter().enumerate() {
println!(" Text {}: '{}'", i, text_result.text);
}
}
}
}
DetProcessorResult
Results from the text detection stage.
pub struct DetProcessorResult(pub Vec<DetProcessorInnerResult>);
pub struct DetProcessorInnerResult {
pub boxes: PointBox<OrderedFloat<f32>>,
pub score: f32,
}
Fields
boxes
PointBox<OrderedFloat<f32>>
Quadrilateral bounding box for the detected text region. Represented as four corner points in clockwise order: top-left, top-right, bottom-right, bottom-left.Coordinates are in the original image space (not the resized input to the detection model).
Detection confidence score between 0.0 and 1.0. Higher values indicate higher confidence that the region contains text.
PointBox Methods
The PointBox type provides utility methods:
// Access corner points
let tl = boxes.tl(); // Top-left
let tr = boxes.tr(); // Top-right
let br = boxes.br(); // Bottom-right
let bl = boxes.bl(); // Bottom-left
// Get dimensions
let width = boxes.width_tlc(); // Width at top edge
let height = boxes.height_tlc(); // Height at left edge
let center = boxes.center_point(); // Center point
Example Usage
let det_result = result.det_result;
println!("Found {} text regions", det_result.0.len());
for (i, region) in det_result.0.iter().enumerate() {
println!("Region {}:", i);
println!(" Top-left: ({}, {})", region.boxes.tl().x, region.boxes.tl().y);
println!(" Top-right: ({}, {})", region.boxes.tr().x, region.boxes.tr().y);
println!(" Bottom-right: ({}, {})", region.boxes.br().x, region.boxes.br().y);
println!(" Bottom-left: ({}, {})", region.boxes.bl().x, region.boxes.bl().y);
println!(" Confidence: {:.2}%", region.score * 100.0);
}
Sorting Behavior
Detected regions are automatically sorted by position:
- Primary: Top to bottom (Y-coordinate)
- Secondary: Left to right (X-coordinate) for regions with similar Y values
Regions are considered at the same vertical position if their Y-coordinates differ by less than 10 pixels.
ClsProcessorResult
Results from the text orientation classification stage.
pub struct ClsProcessorResult(pub Vec<ClsProcessorSingleResult>);
pub struct ClsProcessorSingleResult {
pub label: ClsPostProcessLabel,
}
pub struct ClsPostProcessLabel {
pub label: u16,
pub score: f32,
}
Fields
Predicted rotation angle in degrees. Typically either 0 (no rotation) or 180 (upside-down).
Classification confidence score between 0.0 and 1.0. Higher values indicate higher confidence in the predicted rotation.
Automatic Rotation
Images are automatically rotated 180° before recognition if:
label == 180
score >= cls_processor_config.thresh (default 0.9)
This rotation is applied to the cropped text region internally and does not affect the coordinates returned in DetProcessorResult.
Example Usage
let cls_result = result.cls_result;
for (i, cls) in cls_result.0.iter().enumerate() {
println!("Region {}: rotation {}°, confidence {:.2}%",
i, cls.label.label, cls.label.score * 100.0);
if cls.label.label == 180 {
println!(" → Text is upside-down");
}
}
RecProcessorResult
Results from the text recognition stage.
pub struct RecProcessorResult(pub Vec<RecProcessorSingleResult>);
pub struct RecProcessorSingleResult {
pub text: String,
pub score: f32,
}
Fields
Recognized text content as a UTF-8 string. Decoded from the model’s output using the character dictionary.
Average recognition confidence score between 0.0 and 1.0. Calculated as the mean probability of all predicted characters.
Character Decoding
The recognition model outputs character probabilities for each position. The decoder:
- Selects the highest probability character at each position
- Removes consecutive duplicates (CTC blank collapsing)
- Filters out blank tokens and other ignored tokens
- Concatenates characters into the final string
The confidence score is the average of all character probabilities before filtering.
Example Usage
let rec_result = result.rec_result;
for (i, text_result) in rec_result.0.iter().enumerate() {
println!("Region {}: '{}' (confidence: {:.2}%)",
i, text_result.text, text_result.score * 100.0);
// Filter low-confidence results
if text_result.score < 0.5 {
println!(" ⚠ Low confidence - may be inaccurate");
}
}
Complete Example
Processing all results together:
use retto_core::prelude::*;
let session = RettoSession::new(RettoSessionConfig::default())?;
let image_data = std::fs::read("document.jpg")?;
let result = session.run(image_data)?;
// Results are aligned by index across all stages
assert_eq!(result.det_result.0.len(), result.cls_result.0.len());
assert_eq!(result.det_result.0.len(), result.rec_result.0.len());
for i in 0..result.det_result.0.len() {
let det = &result.det_result.0[i];
let cls = &result.cls_result.0[i];
let rec = &result.rec_result.0[i];
// Skip low-confidence detections
if det.score < 0.5 || rec.score < 0.5 {
continue;
}
println!("Text region {}:", i);
println!(" Location: ({:.0}, {:.0}) to ({:.0}, {:.0})",
det.boxes.tl().x, det.boxes.tl().y,
det.boxes.br().x, det.boxes.br().y);
println!(" Orientation: {}°", cls.label.label);
println!(" Content: '{}'", rec.text);
println!(" Confidence: det={:.1}%, rec={:.1}%",
det.score * 100.0, rec.score * 100.0);
println!();
}
Serialization
All result types support serialization with the serde feature:
[dependencies]
retto-core = { version = "*", features = ["serde"] }
use serde_json;
let result = session.run(image_data)?;
let json = serde_json::to_string_pretty(&result)?;
std::fs::write("results.json", json)?;
This is useful for:
- Saving results for later processing
- Sending results over a network
- Creating datasets for evaluation
- Debugging and logging