Quickstart Guide

This guide walks you through running your first OCR with Retto. We’ll cover examples for Rust, CLI, and WebAssembly.

Prerequisites

Before starting, make sure you’ve installed Retto for your platform.

Rust Library Quickstart

Here’s a complete example of using Retto in a Rust application.

Create a new Rust project

cargo new retto-example
cd retto-example

Add Retto to Cargo.toml

[dependencies]
retto-core = { version = "0.1.5", features = ["serde", "backend-ort", "hf-hub"] }
anyhow = "1.0"

Write your OCR code

Replace src/main.rs with:

use retto_core::prelude::*;
use std::fs;
use anyhow::Result;

fn main() -> Result<()> {
    // Configure the session
    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig {
            device: RettoOrtWorkerDevice::CPU,
            models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
        },
        max_side_len: 2000,
        min_side_len: 30,
        ..Default::default()
    };
    
    // Create OCR session
    let mut session = RettoSession::new(cfg)?;
    
    // Read image
    let image_data = fs::read("image.png")?;
    
    // Run OCR
    let result = session.run(image_data)?;
    
    // Print results
    println!("Found {} text regions", result.det_result.0.len());
    
    for (i, text) in result.rec_result.0.iter().enumerate() {
        println!("Text {}: {} (confidence: {:.2})", 
            i + 1, 
            text.text, 
            text.score
        );
    }
    
    Ok(())
}

Run the example

# Place an image in the project directory
cargo run --release

On first run, models will be automatically downloaded from Hugging Face.

Expected Output

Found 3 text regions
Text 1: Hello World (confidence: 0.98)
Text 2: Welcome to Retto (confidence: 0.95)
Text 3: OCR Example (confidence: 0.97)

Advanced: Streaming Results

For real-time feedback, use streaming mode to get results as each stage completes:

use retto_core::prelude::*;
use std::sync::mpsc;

fn main() -> anyhow::Result<()> {
    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig {
            device: RettoOrtWorkerDevice::CPU,
            models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
        },
        ..Default::default()
    };
    
    let mut session = RettoSession::new(cfg)?;
    let image_data = std::fs::read("image.png")?;
    
    let (tx, rx) = mpsc::channel();
    
    // Run in streaming mode
    session.run_stream(image_data, tx)?;
    
    // Receive results as they complete
    for stage in rx {
        match stage {
            RettoWorkerStageResult::Det(det) => {
                println!("Detection complete: {} regions found", det.0.len());
            }
            RettoWorkerStageResult::Cls(cls) => {
                println!("Classification complete: {} orientations detected", cls.0.len());
            }
            RettoWorkerStageResult::Rec(rec) => {
                println!("Recognition complete!");
                for text in &rec.0 {
                    println!("  - {} (score: {:.2})", text.text, text.score);
                }
            }
        }
    }
    
    Ok(())
}

Using GPU Acceleration

CUDA (NVIDIA)
DirectML (Windows)

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::Cuda(0), // Device ID 0
        models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
    },
    ..Default::default()
};

Update Cargo.toml:

[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-cuda"] }

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::DirectML(0), // Device ID 0
        models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
    },
    ..Default::default()
};

Update Cargo.toml:

[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-directml"] }

CLI Tool Quickstart

The CLI tool is perfect for batch processing multiple images.

Prepare your images

Create a directory with images to process:

mkdir images
# Add some images to the directory

Run OCR on all images

retto-cli --use-hf-hub true -i ./images

CLI Examples

# Process all images in a directory
retto-cli --use-hf-hub true -i ./images

Expected Output

INFO retto_cli: Using Hugging Face Hub for models
INFO retto_cli: Found 5 files, processing...
DEBUG retto_core: Det result: 3 regions detected
DEBUG retto_core: Cls result: 3 orientations classified
DEBUG retto_core: Rec result: 3 texts recognized
INFO retto_cli: Successfully processed 5 images, avg time: 234.56ms

CLI Options

Option	Description	Default
`--det-model-path`	Path to detection model	`ch_PP-OCRv4_det_infer.onnx`
`--cls-model-path`	Path to classification model	`ch_ppocr_mobile_v2.0_cls_infer.onnx`
`--rec-model-path`	Path to recognition model	`ch_PP-OCRv4_rec_infer.onnx`
`--rec-keys-path`	Path to character dictionary	`ppocr_keys_v1.txt`
`-i, --images`	Directory containing images	(required)
`--device`	Device type: `cpu`, `cuda`, `directml`	`cpu`
`--device-id`	GPU device ID	`0`
`--use-hf-hub`	Download models from Hugging Face	`true`

WebAssembly Quickstart

Use Retto in the browser with WebAssembly.

Install the package

npm install @nekoimageland/retto-wasm

Create an HTML file

<!DOCTYPE html>
<html>
<head>
    <title>Retto WASM Example</title>
</head>
<body>
    <h1>Retto OCR Demo</h1>
    <input type="file" id="imageInput" accept="image/*">
    <div id="results"></div>
    <script type="module" src="app.js"></script>
</body>
</html>

Write the JavaScript code

Create app.js:

import { Retto } from '@nekoimageland/retto-wasm';

async function main() {
    // Load the WebAssembly module
    console.log('Loading Retto...');
    const retto = await Retto.load((progress) => {
        console.log(`Loading: ${(progress * 100).toFixed(0)}%`);
    });
    
    // Fetch models (you'll need to host these)
    const models = {
        det_model: await fetch('/models/ch_PP-OCRv4_det_infer.onnx')
            .then(r => r.arrayBuffer()),
        cls_model: await fetch('/models/ch_ppocr_mobile_v2.0_cls_infer.onnx')
            .then(r => r.arrayBuffer()),
        rec_model: await fetch('/models/ch_PP-OCRv4_rec_infer.onnx')
            .then(r => r.arrayBuffer()),
        rec_dict: await fetch('/models/ppocr_keys_v1.txt')
            .then(r => r.arrayBuffer()),
    };
    
    // Initialize Retto
    await retto.init(models);
    console.log('Retto initialized!');
    
    // Handle image uploads
    document.getElementById('imageInput').addEventListener('change', async (e) => {
        const file = e.target.files[0];
        if (!file) return;
        
        const arrayBuffer = await file.arrayBuffer();
        const results = document.getElementById('results');
        results.innerHTML = '<p>Processing...</p>';
        
        // Run OCR with streaming results
        for await (const stage of retto.recognize(arrayBuffer)) {
            if (stage.stage === 'det') {
                results.innerHTML += `<p>✓ Detection: ${stage.result.length} regions found</p>`;
            } else if (stage.stage === 'cls') {
                results.innerHTML += `<p>✓ Classification: ${stage.result.length} orientations</p>`;
            } else if (stage.stage === 'rec') {
                results.innerHTML += '<p>✓ Recognition complete!</p><ul>';
                for (const text of stage.result) {
                    results.innerHTML += `<li>${text.text} (${(text.score * 100).toFixed(1)}%)</li>`;
                }
                results.innerHTML += '</ul>';
            }
        }
    });
}

main().catch(console.error);

Serve and test

Use a local development server:

npx serve .

Open your browser and upload an image to see OCR results!

TypeScript Example

import { Retto, type RettoModel, type RettoWorkerStage } from '@nekoimageland/retto-wasm';

async function runOCR(imageData: ArrayBuffer): Promise<string[]> {
    const retto = await Retto.load();
    
    // For embedded builds (if available)
    if (retto.is_embed_build) {
        await retto.init();
    } else {
        // Load models separately
        const models: RettoModel = {
            det_model: await fetchModel('det'),
            cls_model: await fetchModel('cls'),
            rec_model: await fetchModel('rec'),
            rec_dict: await fetchModel('dict'),
        };
        await retto.init(models);
    }
    
    const texts: string[] = [];
    
    for await (const stage of retto.recognize(imageData)) {
        if (stage.stage === 'rec') {
            texts.push(...stage.result.map(r => r.text));
        }
    }
    
    return texts;
}

async function fetchModel(type: string): Promise<ArrayBuffer> {
    const urls: Record<string, string> = {
        det: '/models/ch_PP-OCRv4_det_infer.onnx',
        cls: '/models/ch_ppocr_mobile_v2.0_cls_infer.onnx',
        rec: '/models/ch_PP-OCRv4_rec_infer.onnx',
        dict: '/models/ppocr_keys_v1.txt',
    };
    const response = await fetch(urls[type]);
    return response.arrayBuffer();
}

Understanding the Results

All platforms return structured results from the OCR pipeline:

Detection Results

pub struct DetProcessorInnerResult {
    pub boxes: PointBox,  // Bounding box coordinates
    pub score: f32,       // Detection confidence (0.0-1.0)
}

Each detected text region includes:

boxes - Four corner points defining the text region
score - Confidence score of the detection

Classification Results

pub struct ClsProcessorSingleResult {
    pub label: ClsPostProcessLabel,  // Orientation label
}

pub struct ClsPostProcessLabel {
    pub label: i32,   // 0, 90, 180, or 270 degrees
    pub score: f32,   // Classification confidence
}

Recognition Results

pub struct RecProcessorSingleResult {
    pub text: String,  // Recognized text
    pub score: f32,    // Recognition confidence
}

Scores range from 0.0 to 1.0, where higher values indicate greater confidence.

Next Steps

API Reference

Explore the complete API documentation

Configuration

Learn about advanced configuration options

Examples

See more practical examples and use cases

Core Concepts

Understand Retto’s architecture and design

Get Started

Core Concepts

Guides

Examples

Quickstart

Quickstart Guide

Prerequisites

Rust Library Quickstart

Expected Output

Advanced: Streaming Results

Using GPU Acceleration

CLI Tool Quickstart

CLI Examples

Expected Output

CLI Options

WebAssembly Quickstart

TypeScript Example

Understanding the Results

Detection Results

Classification Results

Recognition Results

Next Steps

API Reference

Configuration

Examples

Core Concepts

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

​Quickstart Guide

​Prerequisites

​Rust Library Quickstart

​Expected Output

​Advanced: Streaming Results

​Using GPU Acceleration

​CLI Tool Quickstart

​CLI Examples

​Expected Output

​CLI Options

​WebAssembly Quickstart

​TypeScript Example

​Understanding the Results

​Detection Results

​Classification Results

​Recognition Results

​Next Steps

API Reference

Configuration

Examples

Core Concepts

Build docs developers (and LLMs) love

Quickstart Guide

Prerequisites

Rust Library Quickstart

Expected Output

Advanced: Streaming Results

Using GPU Acceleration

CLI Tool Quickstart

CLI Examples

Expected Output

CLI Options

WebAssembly Quickstart

TypeScript Example

Understanding the Results

Detection Results

Classification Results

Recognition Results

Next Steps