Skip to main content

Quickstart Guide

This guide walks you through running your first OCR with Retto. We’ll cover examples for Rust, CLI, and WebAssembly.

Prerequisites

Before starting, make sure you’ve installed Retto for your platform.

Rust Library Quickstart

Here’s a complete example of using Retto in a Rust application.
1

Create a new Rust project

cargo new retto-example
cd retto-example
2

Add Retto to Cargo.toml

[dependencies]
retto-core = { version = "0.1.5", features = ["serde", "backend-ort", "hf-hub"] }
anyhow = "1.0"
3

Write your OCR code

Replace src/main.rs with:
use retto_core::prelude::*;
use std::fs;
use anyhow::Result;

fn main() -> Result<()> {
    // Configure the session
    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig {
            device: RettoOrtWorkerDevice::CPU,
            models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
        },
        max_side_len: 2000,
        min_side_len: 30,
        ..Default::default()
    };
    
    // Create OCR session
    let mut session = RettoSession::new(cfg)?;
    
    // Read image
    let image_data = fs::read("image.png")?;
    
    // Run OCR
    let result = session.run(image_data)?;
    
    // Print results
    println!("Found {} text regions", result.det_result.0.len());
    
    for (i, text) in result.rec_result.0.iter().enumerate() {
        println!("Text {}: {} (confidence: {:.2})", 
            i + 1, 
            text.text, 
            text.score
        );
    }
    
    Ok(())
}
4

Run the example

# Place an image in the project directory
cargo run --release
On first run, models will be automatically downloaded from Hugging Face.

Expected Output

Found 3 text regions
Text 1: Hello World (confidence: 0.98)
Text 2: Welcome to Retto (confidence: 0.95)
Text 3: OCR Example (confidence: 0.97)

Advanced: Streaming Results

For real-time feedback, use streaming mode to get results as each stage completes:
use retto_core::prelude::*;
use std::sync::mpsc;

fn main() -> anyhow::Result<()> {
    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig {
            device: RettoOrtWorkerDevice::CPU,
            models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
        },
        ..Default::default()
    };
    
    let mut session = RettoSession::new(cfg)?;
    let image_data = std::fs::read("image.png")?;
    
    let (tx, rx) = mpsc::channel();
    
    // Run in streaming mode
    session.run_stream(image_data, tx)?;
    
    // Receive results as they complete
    for stage in rx {
        match stage {
            RettoWorkerStageResult::Det(det) => {
                println!("Detection complete: {} regions found", det.0.len());
            }
            RettoWorkerStageResult::Cls(cls) => {
                println!("Classification complete: {} orientations detected", cls.0.len());
            }
            RettoWorkerStageResult::Rec(rec) => {
                println!("Recognition complete!");
                for text in &rec.0 {
                    println!("  - {} (score: {:.2})", text.text, text.score);
                }
            }
        }
    }
    
    Ok(())
}

Using GPU Acceleration

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::Cuda(0), // Device ID 0
        models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
    },
    ..Default::default()
};
Update Cargo.toml:
[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-cuda"] }

CLI Tool Quickstart

The CLI tool is perfect for batch processing multiple images.
1

Prepare your images

Create a directory with images to process:
mkdir images
# Add some images to the directory
2

Run OCR on all images

retto-cli --use-hf-hub true -i ./images

CLI Examples

# Process all images in a directory
retto-cli --use-hf-hub true -i ./images

Expected Output

INFO retto_cli: Using Hugging Face Hub for models
INFO retto_cli: Found 5 files, processing...
DEBUG retto_core: Det result: 3 regions detected
DEBUG retto_core: Cls result: 3 orientations classified
DEBUG retto_core: Rec result: 3 texts recognized
INFO retto_cli: Successfully processed 5 images, avg time: 234.56ms

CLI Options

OptionDescriptionDefault
--det-model-pathPath to detection modelch_PP-OCRv4_det_infer.onnx
--cls-model-pathPath to classification modelch_ppocr_mobile_v2.0_cls_infer.onnx
--rec-model-pathPath to recognition modelch_PP-OCRv4_rec_infer.onnx
--rec-keys-pathPath to character dictionaryppocr_keys_v1.txt
-i, --imagesDirectory containing images(required)
--deviceDevice type: cpu, cuda, directmlcpu
--device-idGPU device ID0
--use-hf-hubDownload models from Hugging Facetrue

WebAssembly Quickstart

Use Retto in the browser with WebAssembly.
1

Install the package

npm install @nekoimageland/retto-wasm
2

Create an HTML file

<!DOCTYPE html>
<html>
<head>
    <title>Retto WASM Example</title>
</head>
<body>
    <h1>Retto OCR Demo</h1>
    <input type="file" id="imageInput" accept="image/*">
    <div id="results"></div>
    <script type="module" src="app.js"></script>
</body>
</html>
3

Write the JavaScript code

Create app.js:
import { Retto } from '@nekoimageland/retto-wasm';

async function main() {
    // Load the WebAssembly module
    console.log('Loading Retto...');
    const retto = await Retto.load((progress) => {
        console.log(`Loading: ${(progress * 100).toFixed(0)}%`);
    });
    
    // Fetch models (you'll need to host these)
    const models = {
        det_model: await fetch('/models/ch_PP-OCRv4_det_infer.onnx')
            .then(r => r.arrayBuffer()),
        cls_model: await fetch('/models/ch_ppocr_mobile_v2.0_cls_infer.onnx')
            .then(r => r.arrayBuffer()),
        rec_model: await fetch('/models/ch_PP-OCRv4_rec_infer.onnx')
            .then(r => r.arrayBuffer()),
        rec_dict: await fetch('/models/ppocr_keys_v1.txt')
            .then(r => r.arrayBuffer()),
    };
    
    // Initialize Retto
    await retto.init(models);
    console.log('Retto initialized!');
    
    // Handle image uploads
    document.getElementById('imageInput').addEventListener('change', async (e) => {
        const file = e.target.files[0];
        if (!file) return;
        
        const arrayBuffer = await file.arrayBuffer();
        const results = document.getElementById('results');
        results.innerHTML = '<p>Processing...</p>';
        
        // Run OCR with streaming results
        for await (const stage of retto.recognize(arrayBuffer)) {
            if (stage.stage === 'det') {
                results.innerHTML += `<p>✓ Detection: ${stage.result.length} regions found</p>`;
            } else if (stage.stage === 'cls') {
                results.innerHTML += `<p>✓ Classification: ${stage.result.length} orientations</p>`;
            } else if (stage.stage === 'rec') {
                results.innerHTML += '<p>✓ Recognition complete!</p><ul>';
                for (const text of stage.result) {
                    results.innerHTML += `<li>${text.text} (${(text.score * 100).toFixed(1)}%)</li>`;
                }
                results.innerHTML += '</ul>';
            }
        }
    });
}

main().catch(console.error);
4

Serve and test

Use a local development server:
npx serve .
Open your browser and upload an image to see OCR results!

TypeScript Example

import { Retto, type RettoModel, type RettoWorkerStage } from '@nekoimageland/retto-wasm';

async function runOCR(imageData: ArrayBuffer): Promise<string[]> {
    const retto = await Retto.load();
    
    // For embedded builds (if available)
    if (retto.is_embed_build) {
        await retto.init();
    } else {
        // Load models separately
        const models: RettoModel = {
            det_model: await fetchModel('det'),
            cls_model: await fetchModel('cls'),
            rec_model: await fetchModel('rec'),
            rec_dict: await fetchModel('dict'),
        };
        await retto.init(models);
    }
    
    const texts: string[] = [];
    
    for await (const stage of retto.recognize(imageData)) {
        if (stage.stage === 'rec') {
            texts.push(...stage.result.map(r => r.text));
        }
    }
    
    return texts;
}

async function fetchModel(type: string): Promise<ArrayBuffer> {
    const urls: Record<string, string> = {
        det: '/models/ch_PP-OCRv4_det_infer.onnx',
        cls: '/models/ch_ppocr_mobile_v2.0_cls_infer.onnx',
        rec: '/models/ch_PP-OCRv4_rec_infer.onnx',
        dict: '/models/ppocr_keys_v1.txt',
    };
    const response = await fetch(urls[type]);
    return response.arrayBuffer();
}

Understanding the Results

All platforms return structured results from the OCR pipeline:

Detection Results

pub struct DetProcessorInnerResult {
    pub boxes: PointBox,  // Bounding box coordinates
    pub score: f32,       // Detection confidence (0.0-1.0)
}
Each detected text region includes:
  • boxes - Four corner points defining the text region
  • score - Confidence score of the detection

Classification Results

pub struct ClsProcessorSingleResult {
    pub label: ClsPostProcessLabel,  // Orientation label
}

pub struct ClsPostProcessLabel {
    pub label: i32,   // 0, 90, 180, or 270 degrees
    pub score: f32,   // Classification confidence
}

Recognition Results

pub struct RecProcessorSingleResult {
    pub text: String,  // Recognized text
    pub score: f32,    // Recognition confidence
}
Scores range from 0.0 to 1.0, where higher values indicate greater confidence.

Next Steps

API Reference

Explore the complete API documentation

Configuration

Learn about advanced configuration options

Examples

See more practical examples and use cases

Core Concepts

Understand Retto’s architecture and design

Build docs developers (and LLMs) love