Quickstart Guide
This guide walks you through running your first OCR with Retto. We’ll cover examples for Rust, CLI, and WebAssembly.
Prerequisites
Before starting, make sure you’ve installed Retto for your platform.
Rust Library Quickstart
Here’s a complete example of using Retto in a Rust application.
Create a new Rust project
cargo new retto-example
cd retto-example
Add Retto to Cargo.toml
[dependencies]
retto-core = { version = "0.1.5", features = ["serde", "backend-ort", "hf-hub"] }
anyhow = "1.0"
Write your OCR code
Replace src/main.rs with:use retto_core::prelude::*;
use std::fs;
use anyhow::Result;
fn main() -> Result<()> {
// Configure the session
let cfg = RettoSessionConfig {
worker_config: RettoOrtWorkerConfig {
device: RettoOrtWorkerDevice::CPU,
models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
},
max_side_len: 2000,
min_side_len: 30,
..Default::default()
};
// Create OCR session
let mut session = RettoSession::new(cfg)?;
// Read image
let image_data = fs::read("image.png")?;
// Run OCR
let result = session.run(image_data)?;
// Print results
println!("Found {} text regions", result.det_result.0.len());
for (i, text) in result.rec_result.0.iter().enumerate() {
println!("Text {}: {} (confidence: {:.2})",
i + 1,
text.text,
text.score
);
}
Ok(())
}
Run the example
# Place an image in the project directory
cargo run --release
On first run, models will be automatically downloaded from Hugging Face.
Expected Output
Found 3 text regions
Text 1: Hello World (confidence: 0.98)
Text 2: Welcome to Retto (confidence: 0.95)
Text 3: OCR Example (confidence: 0.97)
Advanced: Streaming Results
For real-time feedback, use streaming mode to get results as each stage completes:
use retto_core::prelude::*;
use std::sync::mpsc;
fn main() -> anyhow::Result<()> {
let cfg = RettoSessionConfig {
worker_config: RettoOrtWorkerConfig {
device: RettoOrtWorkerDevice::CPU,
models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
},
..Default::default()
};
let mut session = RettoSession::new(cfg)?;
let image_data = std::fs::read("image.png")?;
let (tx, rx) = mpsc::channel();
// Run in streaming mode
session.run_stream(image_data, tx)?;
// Receive results as they complete
for stage in rx {
match stage {
RettoWorkerStageResult::Det(det) => {
println!("Detection complete: {} regions found", det.0.len());
}
RettoWorkerStageResult::Cls(cls) => {
println!("Classification complete: {} orientations detected", cls.0.len());
}
RettoWorkerStageResult::Rec(rec) => {
println!("Recognition complete!");
for text in &rec.0 {
println!(" - {} (score: {:.2})", text.text, text.score);
}
}
}
}
Ok(())
}
Using GPU Acceleration
CUDA (NVIDIA)
DirectML (Windows)
let cfg = RettoSessionConfig {
worker_config: RettoOrtWorkerConfig {
device: RettoOrtWorkerDevice::Cuda(0), // Device ID 0
models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
},
..Default::default()
};
Update Cargo.toml:[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-cuda"] }
let cfg = RettoSessionConfig {
worker_config: RettoOrtWorkerConfig {
device: RettoOrtWorkerDevice::DirectML(0), // Device ID 0
models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
},
..Default::default()
};
Update Cargo.toml:[dependencies]
retto-core = { version = "0.1.5", features = ["backend-ort-directml"] }
The CLI tool is perfect for batch processing multiple images.
Prepare your images
Create a directory with images to process:mkdir images
# Add some images to the directory
Run OCR on all images
retto-cli --use-hf-hub true -i ./images
CLI Examples
# Process all images in a directory
retto-cli --use-hf-hub true -i ./images
Expected Output
INFO retto_cli: Using Hugging Face Hub for models
INFO retto_cli: Found 5 files, processing...
DEBUG retto_core: Det result: 3 regions detected
DEBUG retto_core: Cls result: 3 orientations classified
DEBUG retto_core: Rec result: 3 texts recognized
INFO retto_cli: Successfully processed 5 images, avg time: 234.56ms
CLI Options
| Option | Description | Default |
|---|
--det-model-path | Path to detection model | ch_PP-OCRv4_det_infer.onnx |
--cls-model-path | Path to classification model | ch_ppocr_mobile_v2.0_cls_infer.onnx |
--rec-model-path | Path to recognition model | ch_PP-OCRv4_rec_infer.onnx |
--rec-keys-path | Path to character dictionary | ppocr_keys_v1.txt |
-i, --images | Directory containing images | (required) |
--device | Device type: cpu, cuda, directml | cpu |
--device-id | GPU device ID | 0 |
--use-hf-hub | Download models from Hugging Face | true |
WebAssembly Quickstart
Use Retto in the browser with WebAssembly.
Install the package
npm install @nekoimageland/retto-wasm
Create an HTML file
<!DOCTYPE html>
<html>
<head>
<title>Retto WASM Example</title>
</head>
<body>
<h1>Retto OCR Demo</h1>
<input type="file" id="imageInput" accept="image/*">
<div id="results"></div>
<script type="module" src="app.js"></script>
</body>
</html>
Write the JavaScript code
Create app.js:import { Retto } from '@nekoimageland/retto-wasm';
async function main() {
// Load the WebAssembly module
console.log('Loading Retto...');
const retto = await Retto.load((progress) => {
console.log(`Loading: ${(progress * 100).toFixed(0)}%`);
});
// Fetch models (you'll need to host these)
const models = {
det_model: await fetch('/models/ch_PP-OCRv4_det_infer.onnx')
.then(r => r.arrayBuffer()),
cls_model: await fetch('/models/ch_ppocr_mobile_v2.0_cls_infer.onnx')
.then(r => r.arrayBuffer()),
rec_model: await fetch('/models/ch_PP-OCRv4_rec_infer.onnx')
.then(r => r.arrayBuffer()),
rec_dict: await fetch('/models/ppocr_keys_v1.txt')
.then(r => r.arrayBuffer()),
};
// Initialize Retto
await retto.init(models);
console.log('Retto initialized!');
// Handle image uploads
document.getElementById('imageInput').addEventListener('change', async (e) => {
const file = e.target.files[0];
if (!file) return;
const arrayBuffer = await file.arrayBuffer();
const results = document.getElementById('results');
results.innerHTML = '<p>Processing...</p>';
// Run OCR with streaming results
for await (const stage of retto.recognize(arrayBuffer)) {
if (stage.stage === 'det') {
results.innerHTML += `<p>✓ Detection: ${stage.result.length} regions found</p>`;
} else if (stage.stage === 'cls') {
results.innerHTML += `<p>✓ Classification: ${stage.result.length} orientations</p>`;
} else if (stage.stage === 'rec') {
results.innerHTML += '<p>✓ Recognition complete!</p><ul>';
for (const text of stage.result) {
results.innerHTML += `<li>${text.text} (${(text.score * 100).toFixed(1)}%)</li>`;
}
results.innerHTML += '</ul>';
}
}
});
}
main().catch(console.error);
Serve and test
Use a local development server:Open your browser and upload an image to see OCR results!
TypeScript Example
import { Retto, type RettoModel, type RettoWorkerStage } from '@nekoimageland/retto-wasm';
async function runOCR(imageData: ArrayBuffer): Promise<string[]> {
const retto = await Retto.load();
// For embedded builds (if available)
if (retto.is_embed_build) {
await retto.init();
} else {
// Load models separately
const models: RettoModel = {
det_model: await fetchModel('det'),
cls_model: await fetchModel('cls'),
rec_model: await fetchModel('rec'),
rec_dict: await fetchModel('dict'),
};
await retto.init(models);
}
const texts: string[] = [];
for await (const stage of retto.recognize(imageData)) {
if (stage.stage === 'rec') {
texts.push(...stage.result.map(r => r.text));
}
}
return texts;
}
async function fetchModel(type: string): Promise<ArrayBuffer> {
const urls: Record<string, string> = {
det: '/models/ch_PP-OCRv4_det_infer.onnx',
cls: '/models/ch_ppocr_mobile_v2.0_cls_infer.onnx',
rec: '/models/ch_PP-OCRv4_rec_infer.onnx',
dict: '/models/ppocr_keys_v1.txt',
};
const response = await fetch(urls[type]);
return response.arrayBuffer();
}
Understanding the Results
All platforms return structured results from the OCR pipeline:
Detection Results
pub struct DetProcessorInnerResult {
pub boxes: PointBox, // Bounding box coordinates
pub score: f32, // Detection confidence (0.0-1.0)
}
Each detected text region includes:
- boxes - Four corner points defining the text region
- score - Confidence score of the detection
Classification Results
pub struct ClsProcessorSingleResult {
pub label: ClsPostProcessLabel, // Orientation label
}
pub struct ClsPostProcessLabel {
pub label: i32, // 0, 90, 180, or 270 degrees
pub score: f32, // Classification confidence
}
Recognition Results
pub struct RecProcessorSingleResult {
pub text: String, // Recognized text
pub score: f32, // Recognition confidence
}
Scores range from 0.0 to 1.0, where higher values indicate greater confidence.
Next Steps
API Reference
Explore the complete API documentation
Configuration
Learn about advanced configuration options
Examples
See more practical examples and use cases
Core Concepts
Understand Retto’s architecture and design