Skip to main content

Installation

Add retto-core to your Cargo.toml:
Cargo.toml
[dependencies]
retto-core = "0.1.5"

Feature Flags

Retto provides several feature flags to customize your build:
Cargo.toml
[dependencies]
retto-core = { version = "0.1.5", features = [
    "serde",           # Enable JSON serialization support
    "backend-ort",     # Enable ONNX Runtime backend (CPU)
    "hf-hub",          # Enable HuggingFace Hub model loading
]}
The backend-ort feature is required for using the default ONNX Runtime worker. Additional backend features are available for GPU acceleration (see Backends).

Creating a Session

The core of Retto is the RettoSession, which manages the OCR pipeline:
use retto_core::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create configuration with defaults
    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig::default(),
        ..Default::default()
    };
    
    // Initialize the session
    let mut session = RettoSession::new(cfg)?;
    
    Ok(())
}

Configuration Options

RettoSessionConfig provides fine-grained control over the OCR pipeline:
use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
    },
    max_side_len: 2000,  // Maximum image dimension
    min_side_len: 30,     // Minimum image dimension
    det_processor_config: DetProcessorConfig::default(),
    cls_processor_config: ClsProcessorConfig::default(),
    rec_processor_config: RecProcessorConfig::default(),
};
let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    ..Default::default()
};

Running OCR

1

Load an image

Read your image file as bytes:
use std::fs;

let image_data = fs::read("image.png")?;
2

Run OCR

Pass the image data to the session:
let result = session.run(image_data)?;
3

Process results

Access detection, classification, and recognition results:
// Detection results (text regions)
for det in &result.det_result.0 {
    println!("Box: {:?}, Score: {}", det.boxes, det.score);
}

// Classification results (text orientation)
for cls in &result.cls_result.0 {
    println!("Rotation: {}°, Score: {}", cls.label.label, cls.label.score);
}

// Recognition results (extracted text)
for rec in &result.rec_result.0 {
    println!("Text: {}, Score: {}", rec.text, rec.score);
}

Complete Example

Here’s a complete working example:
main.rs
use retto_core::prelude::*;
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Configure the session
    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig::default(),
        ..Default::default()
    };
    
    // Create the session
    let mut session = RettoSession::new(cfg)?;
    
    // Load and process an image
    let image_data = fs::read("sample.png")?;
    let result = session.run(image_data)?;
    
    // Print recognized text
    println!("Found {} text regions", result.rec_result.0.len());
    for (i, rec) in result.rec_result.0.iter().enumerate() {
        println!("Region {}: {} (confidence: {:.2})", 
                 i + 1, rec.text, rec.score);
    }
    
    Ok(())
}

Streaming Results

For real-time feedback during OCR processing, use run_stream:
use retto_core::prelude::*;
use std::sync::mpsc;

let (tx, rx) = mpsc::channel::<RettoWorkerStageResult>();

// Run OCR in streaming mode
session.run_stream(image_data, tx)?;

// Process results as they arrive
for stage in rx {
    match stage {
        RettoWorkerStageResult::Det(det) => {
            println!("Detection complete: {} regions found", det.0.len());
        }
        RettoWorkerStageResult::Cls(cls) => {
            println!("Classification complete");
        }
        RettoWorkerStageResult::Rec(rec) => {
            println!("Recognition complete");
            for r in &rec.0 {
                println!("  - {}", r.text);
            }
        }
    }
}

Test Examples

From the test suite in session.rs:206, here’s how Retto handles rotated text:
use retto_core::prelude::*;
use image::{ImageFormat, RgbImage};
use std::io::Cursor;

let text = "玩原神玩的";
let (w, h) = (200.0, 50.0);

// Create test image (implementation details omitted)
let image: RgbImage = create_rotated_text_image(text, 180.0, w, h);

// Encode to PNG
let mut buf = Vec::new();
image.write_to(&mut Cursor::new(&mut buf), ImageFormat::Png)?;

// Run OCR
let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    ..Default::default()
};
let mut session = RettoSession::new(cfg)?;
let result = session.run(buf)?;

// Verify results
assert_eq!(result.cls_result.0[0].label.label, 180);
assert_eq!(result.rec_result.0[0].text, text);
The session is not thread-safe by default. If you need to process multiple images concurrently, create separate sessions or use proper synchronization.

Next Steps

Model Loading

Learn about different ways to load OCR models

Backends

Configure GPU acceleration and other backends

Build docs developers (and LLMs) love