Skip to main content

Introduction to Retto

Retto is a high-performance OCR (Optical Character Recognition) SDK built in Rust that provides PaddleOCR inference capabilities with WebAssembly support. It enables fast, accurate text detection and recognition across multiple platforms including native Rust applications, command-line tools, and web browsers.

What is Retto?

Retto is a complete OCR solution that implements the PaddleOCR v4 pipeline, consisting of three stages:
  1. Text Detection - Locates text regions in images
  2. Text Classification - Determines text orientation (0°, 90°, 180°, 270°)
  3. Text Recognition - Recognizes the actual text content
The project provides three main components:
  • retto-core - Core Rust library with processor implementations
  • retto-cli - Command-line tool for batch OCR processing
  • retto-wasm - WebAssembly package for browser-based OCR

Why Use Retto?

High Performance

Built with Rust for maximum performance and memory safety. Utilizes ONNX Runtime for optimized inference with CPU, CUDA, and DirectML support.

Cross-Platform

Run OCR natively on desktop, in command-line tools, or directly in web browsers via WebAssembly - all from the same codebase.

Multiple Backends

Supports multiple execution providers including CPU, CUDA (NVIDIA GPUs), and DirectML (Windows) for optimal performance on different hardware.

Flexible Model Loading

Load models from local files, memory buffers, or automatically download from Hugging Face Hub.

Key Features

  • PaddleOCR v4 Models - Implements the latest PaddleOCR v4 pipeline with detection, classification, and recognition
  • ONNX Runtime Integration - Leverages ONNX Runtime for efficient model inference
  • Streaming Results - Process OCR stages asynchronously with callback support
  • Batch Processing - Process multiple images efficiently with parallel processing
  • Type Safety - Full Rust type safety with comprehensive error handling
  • Serialization Support - Built-in JSON serialization for easy integration
  • Hardware Acceleration - Optional CUDA and DirectML support for GPU acceleration

Architecture Overview

Retto processes images through a pipeline architecture:
  1. Image Helper - Handles image loading, resizing, and preprocessing
  2. Detection Processor - Uses a CNN model to detect text bounding boxes
  3. Classification Processor - Determines text orientation for each detected region
  4. Recognition Processor - Converts text regions into actual text using CTC decoding

Session-Based API

Retto uses a session-based design where you create a RettoSession with your desired configuration:
let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models: RettoOrtWorkerModelProvider::from_hf_hub_v4_default(),
    },
    max_side_len: 2000,
    min_side_len: 30,
    ..Default::default()
};

let mut session = RettoSession::new(cfg)?;
The session manages model loading, resource allocation, and provides methods for synchronous and streaming inference.

Use Cases

Extract text from scanned documents, receipts, and forms for archival and searchability.
Build browser-based OCR tools without server-side processing using the WebAssembly package.
Process large volumes of images efficiently using the CLI tool with parallel processing.
Integrate OCR capabilities into desktop applications with low-latency inference.

Getting Started

1

Choose Your Platform

Decide whether you need the Rust library, CLI tool, or WebAssembly package based on your use case.
2

Install Retto

Follow the installation guide to add Retto to your project.
3

Run Your First OCR

Check out the quickstart guide to run your first OCR in minutes.
Retto is currently in active development (v0.1.5). The API may change in future versions.

License

Retto is licensed under the Apache License 2.0, making it suitable for both open source and commercial projects.

Next Steps

Installation

Learn how to install Retto in your project

Quickstart

Run your first OCR in minutes

Build docs developers (and LLMs) love