Skip to main content

Overview

RettoSession<W: RettoWorker> is the primary interface for executing OCR operations in Retto. It manages the complete OCR pipeline including text detection, classification, and recognition.

Type Definition

pub struct RettoSession<W: RettoWorker> {
    worker: W,
    rec_character: RecCharacter,
    config: RettoSessionConfig<W>,
}
The session is generic over a RettoWorker type, which handles the actual model inference. Common implementations include RettoOrtWorker for ONNX Runtime.

Constructor

new()

Creates a new OCR session with the specified configuration.
pub fn new(cfg: RettoSessionConfig<W>) -> RettoResult<Self>
cfg
RettoSessionConfig<W>
required
Configuration for the session, including worker config and processor settings
Returns
RettoResult<RettoSession<W>>
Returns a new session instance or an error if initialization fails (e.g., model loading errors)

Example

use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    max_side_len: 2000,
    min_side_len: 30,
    det_processor_config: DetProcessorConfig::default(),
    cls_processor_config: ClsProcessorConfig::default(),
    rec_processor_config: RecProcessorConfig::default(),
};

let session = RettoSession::new(cfg)?;

Methods

run()

Executes the complete OCR pipeline on an input image and returns all results.
pub fn run(&mut self, input: impl AsRef<[u8]>) -> RettoResult<RettoWorkerResult>
input
impl AsRef<[u8]>
required
Raw image data in any format supported by the image crate (PNG, JPEG, etc.)
Returns
RettoResult<RettoWorkerResult>
Combined results from all three stages: detection, classification, and recognition

Processing Pipeline

The run() method executes three sequential stages:
  1. Detection: Locates text regions in the image
  2. Classification: Determines text orientation (0° or 180°)
  3. Recognition: Extracts text content from each region

Example

let image_data = std::fs::read("input.png")?;
let result = session.run(image_data)?;

// Access detection results
for det in &result.det_result.0 {
    println!("Text box: {:?}, score: {}", det.boxes, det.score);
}

// Access classification results
for cls in &result.cls_result.0 {
    println!("Orientation: {}°, score: {}", cls.label.label, cls.label.score);
}

// Access recognition results
for rec in &result.rec_result.0 {
    println!("Text: {}, confidence: {}", rec.text, rec.score);
}

run_stream()

Executes the OCR pipeline with streaming results via a channel.
pub fn run_stream(
    &mut self,
    input: impl AsRef<[u8]>,
    sender: mpsc::Sender<RettoWorkerStageResult>,
) -> RettoResult<()>
input
impl AsRef<[u8]>
required
Raw image data in any format supported by the image crate
sender
mpsc::Sender<RettoWorkerStageResult>
required
Channel sender for receiving stage results as they complete
Returns
RettoResult<()>
Returns Ok(()) if pipeline completes successfully. Results are sent via the channel.

Use Case

The streaming API is useful when you need to:
  • Process results as soon as each stage completes
  • Display progressive results in a UI
  • Implement early termination based on intermediate results

Example

use std::sync::mpsc;

let (tx, rx) = mpsc::channel();
let image_data = std::fs::read("input.png")?;

// Run in a separate thread to avoid blocking
std::thread::spawn(move || {
    session.run_stream(image_data, tx).unwrap();
});

// Receive results as they arrive
for stage_result in rx {
    match stage_result {
        RettoWorkerStageResult::Det(det) => {
            println!("Detection complete: {} regions found", det.0.len());
        }
        RettoWorkerStageResult::Cls(cls) => {
            println!("Classification complete: {} results", cls.0.len());
        }
        RettoWorkerStageResult::Rec(rec) => {
            println!("Recognition complete: {} texts extracted", rec.0.len());
        }
    }
}

Image Processing

Before OCR processing begins, RettoSession automatically:
  1. Decodes the input image from raw bytes
  2. Resizes the image based on max_side_len and min_side_len constraints
  3. Maintains aspect ratio during resizing
  4. Scales coordinates back to original image dimensions in results
These preprocessing steps are configured via RettoSessionConfig.

Thread Safety

RettoSession is not Send or Sync by default. For multi-threaded usage:
  • Create one session per thread
  • Use a thread pool pattern with session-per-worker
  • Share configuration but not session instances

Build docs developers (and LLMs) love