Skip to main content

Overview

This example demonstrates the basic usage of Retto for performing OCR on images. You’ll learn how to:
  • Initialize a RettoSession with default configuration
  • Load and process an image
  • Extract text detection boxes, rotation angles, and recognized text
  • Parse the results

Complete Example

use retto_core::prelude::*;
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create default configuration
    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig::default(),
        ..Default::default()
    };

    // Initialize session
    let mut session = RettoSession::new(cfg)?;

    // Load image from file
    let image_bytes = fs::read("input.png")?;

    // Run OCR
    let result = session.run(image_bytes)?;

    // Process detection results
    println!("Found {} text regions", result.det_result.0.len());
    
    for (idx, det) in result.det_result.0.iter().enumerate() {
        // Get bounding box coordinates
        let boxes = &det.boxes;
        println!("\nText region {}:", idx);
        println!("  Top-left: ({}, {})", boxes.tl().x, boxes.tl().y);
        println!("  Top-right: ({}, {})", boxes.tr().x, boxes.tr().y);
        println!("  Bottom-right: ({}, {})", boxes.br().x, boxes.br().y);
        println!("  Bottom-left: ({}, {})", boxes.bl().x, boxes.bl().y);
        println!("  Detection score: {:.2}", det.score);
    }

    // Process classification results (rotation angles)
    for (idx, cls) in result.cls_result.0.iter().enumerate() {
        println!("\nText region {} rotation:", idx);
        println!("  Angle: {}°", cls.label.label);
        println!("  Confidence: {:.2}", cls.label.score);
    }

    // Process recognition results (actual text)
    for (idx, rec) in result.rec_result.0.iter().enumerate() {
        println!("\nText region {} content:", idx);
        println!("  Text: {}", rec.text);
        println!("  Confidence: {:.2}", rec.score);
    }

    Ok(())
}

Step-by-Step Breakdown

1. Create Configuration

Start by creating a RettoSessionConfig with default settings:
let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    ..Default::default()
};
The default configuration automatically downloads models from Hugging Face (when the hf-hub feature is enabled) or uses local models.

2. Initialize Session

Create a new session with your configuration:
let mut session = RettoSession::new(cfg)?;
The session initializes the ONNX Runtime backend and loads the three models (detection, classification, recognition).

3. Load and Process Image

Read your image file as bytes:
let image_bytes = fs::read("input.png")?;
let result = session.run(image_bytes)?;
Retto accepts images in common formats (PNG, JPEG, etc.) as byte arrays.

4. Parse Results

The RettoWorkerResult contains three components:

Detection Results (det_result)

Contains bounding boxes and confidence scores for detected text regions:
for det in result.det_result.0.iter() {
    // Access the four corner points
    let tl = det.boxes.tl();  // Top-left
    let tr = det.boxes.tr();  // Top-right
    let br = det.boxes.br();  // Bottom-right
    let bl = det.boxes.bl();  // Bottom-left
    
    // Get detection confidence
    let score = det.score;
}

Classification Results (cls_result)

Contains rotation angles (0° or 180°) for each detected region:
for cls in result.cls_result.0.iter() {
    let angle = cls.label.label;      // 0 or 180
    let confidence = cls.label.score; // 0.0 to 1.0
}

Recognition Results (rec_result)

Contains the actual recognized text:
for rec in result.rec_result.0.iter() {
    let text = &rec.text;           // Recognized text
    let confidence = rec.score;      // 0.0 to 1.0
}

Complete Working Example

Here’s a more practical example that processes an image and outputs formatted results:
use retto_core::prelude::*;
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize with default config
    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig::default(),
        ..Default::default()
    };
    let mut session = RettoSession::new(cfg)?;

    // Process image
    let img_bytes = fs::read("document.png")?;
    let result = session.run(img_bytes)?;

    // Print results in a structured format
    println!("OCR Results");
    println!("=".repeat(50));
    
    for (idx, (det, cls, rec)) in result.det_result.0.iter()
        .zip(result.cls_result.0.iter())
        .zip(result.rec_result.0.iter())
        .map(|((d, c), r)| (d, c, r))
        .enumerate()
    {
        println!("\n[Region {}]", idx + 1);
        println!("Text: {}", rec.text);
        println!("Confidence: {:.1}%", rec.score * 100.0);
        println!("Rotation: {}°", cls.label.label);
        println!("Position: TL({:.0},{:.0}) BR({:.0},{:.0})",
            det.boxes.tl().x, det.boxes.tl().y,
            det.boxes.br().x, det.boxes.br().y
        );
    }

    Ok(())
}

Model Loading

By default, Retto uses the Hugging Face Hub to download models automatically:
// Automatic download from Hugging Face (default)
let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    ..Default::default()
};
You can also specify local model paths:
let models = RettoOrtWorkerModelProvider(RettoWorkerModelProvider {
    det: RettoWorkerModelSource::Path("ch_PP-OCRv4_det_infer.onnx".into()),
    rec: RettoWorkerModelSource::Path("ch_PP-OCRv4_rec_infer.onnx".into()),
    cls: RettoWorkerModelSource::Path("ch_ppocr_mobile_v2.0_cls_infer.onnx".into()),
});

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models,
    },
    ..Default::default()
};

Next Steps

Streaming OCR

Process OCR stages with real-time callbacks

Custom Configuration

Fine-tune detection, classification, and recognition parameters

Build docs developers (and LLMs) love