Recognition Processor

Overview

The Recognition Processor (RecProcessor) performs text recognition (OCR) on text images. It converts images of text into readable strings using character dictionaries and CTC (Connectionist Temporal Classification) decoding. Source: retto-core/src/processor/rec_processor.rs

RecProcessor

The main recognition processor that extracts text from images.

Constructor

pub fn new(config: &RecProcessorConfig, character: &RecCharacter) -> Self

config

&RecProcessorConfig

required

Recognition processor configuration

character

&RecCharacter

required

Character dictionary for text decoding. Maps model outputs to text characters.

Process Method

fn process<F>(
    &self,
    images: &Vec<ImageHelper>,
    worker_fun: F,
) -> RettoResult<RecProcessorResult>
where
    F: FnMut(Array4<f32>) -> RettoResult<Array3<f32>>

Processes a batch of text images to extract recognized text.

images

&Vec<ImageHelper>

required

Immutable reference to a vector of images to recognize. These should be cropped text regions (e.g., from detection results).

worker_fun

required

Worker function that runs model inference on preprocessed batches. Takes a 4D tensor (batch × channels × height × width) and returns a 3D tensor (batch × sequence × classes).

RecProcessorResult

struct

Recognition results containing recognized text and confidence scores for each image

RecProcessorConfig

Configuration structure for the recognition processor.

Fields

character_source

RecCharacterDictProvider

required

Identifies the provider of the model dictionary source. Specifies where to load the character dictionary from.

image_shape

[usize; 3]

default:"[3, 48, 320]"

Image size during recognition as [channels, height, width]. Text images are resized to this shape before processing.

batch_num

usize

default:"6"

Batch size for recognition. Images are processed in batches of this size for efficiency.

Default Configuration

The default configuration varies by platform and features:

// With hf-hub feature (Hugging Face Hub)
let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::HuggingFace {
            repo: "pk5ls20/PaddleModel",
            model: "retto/onnx/ppocr_keys_v1.txt",
        }
    ),
    image_shape: [3, 48, 320],
    batch_num: 6,
};

// Without hf-hub feature (local file)
let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Path("ppocr_keys_v1.txt".into())
    ),
    image_shape: [3, 48, 320],
    batch_num: 6,
};

// WebAssembly with download-models feature (embedded)
let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Blob(
            include_bytes!("../../models/ppocr_keys_v1.txt").to_vec()
        )
    ),
    image_shape: [3, 48, 320],
    batch_num: 6,
};

Example

use retto_core::processor::RecProcessorConfig;
use retto_core::processor::RecCharacterDictProvider;
use retto_core::worker::RettoWorkerModelSource;

// Use default configuration
let config = RecProcessorConfig::default();

// Custom configuration with local dictionary
let custom_config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Path("./my_dict.txt".into())
    ),
    image_shape: [3, 32, 256],
    batch_num: 8,
};

// Configuration with embedded dictionary
let embedded_config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Blob(
            include_bytes!("../dicts/chinese.txt").to_vec()
        )
    ),
    image_shape: [3, 48, 320],
    batch_num: 6,
};

RecCharacterDictProvider

Enum specifying the source of the character dictionary.

pub enum RecCharacterDictProvider {
    OutSide(RettoWorkerModelSource),
    Inline(),  // TODO: Load from ONNX model
}

OutSide

enum variant

Load dictionary from external sources (file, URL, or embedded blob). Requires a RettoWorkerModelSource parameter.

Inline

enum variant

Load dictionary from the ONNX model itself (not yet implemented).

RettoWorkerModelSource

Specifies the actual source location:

pub enum RettoWorkerModelSource {
    // Load from local file path
    Path(PathBuf),
    
    // Download from Hugging Face Hub
    HuggingFace { repo: String, model: String },
    
    // Load from embedded byte array
    Blob(Vec<u8>),
}

RecCharacter

Internal structure managing the character dictionary and decoding logic.

Constructor

pub fn new(
    dict: RecCharacterDictProvider,
    ignored_tokens: Vec<usize>,
) -> RettoResult<Self>

dict

RecCharacterDictProvider

required

Source provider for the character dictionary

ignored_tokens

Vec<usize>

required

List of token indices to ignore during decoding (e.g., padding tokens)

Dictionary Format

Character dictionaries are plain text files with one character per line:

的
一
是
不
了
...

The processor automatically adds special characters:

"blank" is inserted at index 0 (CTC blank token)
" " (space) is appended at the end

Decode Method

fn decode(
    &self,
    text_index: &Array2<usize>,
    text_prob: &Array2<f32>,
    wh_ratio_list: &[OrderedFloat<f32>],
    max_wh_ratio: OrderedFloat<f32>,
    remove_duplicate: bool,
    return_word_box: bool,  // Not yet implemented
) -> Vec<(String, f32)>

Internal method that decodes model outputs into text strings.

RecProcessorResult

Result structure containing recognition results for all processed images.

pub struct RecProcessorResult(pub Vec<RecProcessorSingleResult>);

Vec<RecProcessorSingleResult>

Vector of recognition results, one per input image in the same order as input

RecProcessorSingleResult

Recognition result for a single image.

pub struct RecProcessorSingleResult {
    pub text: String,
    pub score: f32,
}

text

String

The recognized text string extracted from the image

score

f32

Average confidence score for the recognized text (0.0 to 1.0). Calculated as the mean of all character probabilities. Higher values indicate more confident recognition.

Processing Pipeline

The recognition processor follows this pipeline:

Batch Preparation:
- Sort images by aspect ratio (width/height) in descending order
- Group images into batches of size batch_num
- Calculate maximum width/height ratio for each batch
Preprocessing (per batch):
- Resize each image to fit image_shape while maintaining aspect ratio
- Images shorter than the target width are padded to match the widest image in the batch
- Normalize pixel values
- Stack images into a batch tensor (4D array)
Model Inference:
- Pass preprocessed batch to the recognition model via worker_fun
- Model outputs character probability sequence (3D array: batch × sequence × classes)
Postprocessing (CTC Decoding):
- For each image in the batch:
  - Find the most likely character at each time step (argmax)
  - Remove duplicate consecutive characters
  - Remove blank tokens (index 0)
  - Remove ignored tokens
  - Map token indices to characters using the dictionary
  - Calculate average confidence score
- Store results maintaining original input order

CTC Decoding

The processor uses CTC (Connectionist Temporal Classification) decoding:

Argmax Selection: Select the most probable character at each time step
Duplicate Removal: Consecutive identical characters are merged (“hello” from “hheelllloo”)
Blank Removal: CTC blank tokens are removed
Character Mapping: Token indices are mapped to actual characters

Example Decoding

Model output indices: [0, 15, 15, 8, 0, 12, 12, 0, 0]
After blank removal:  [15, 15, 8, 12, 12]
After dedup:          [15, 8, 12]
Mapped to characters: ['h', 'e', 'l']
Final text:           "hel"

Example Usage

use retto_core::processor::{RecProcessor, RecProcessorConfig, RecCharacter};
use retto_core::processor::RecCharacterDictProvider;
use retto_core::image_helper::ImageHelper;

// Create configuration
let config = RecProcessorConfig::default();

// Load character dictionary
let character = RecCharacter::new(
    config.character_source.clone(),
    vec![],  // No ignored tokens
)?;

// Create processor
let processor = RecProcessor::new(&config, &character);

// Prepare images (e.g., cropped and oriented text regions)
let images: Vec<ImageHelper> = vec![/* ... */];

// Process images with model inference function
let results = processor.process(&images, |batch| {
    // Run your recognition model inference here
    model.run(batch)
})?;

// Access recognition results
for (i, result) in results.0.iter().enumerate() {
    println!("Image {}: text = '{}', confidence = {:.2}",
        i,
        result.text,
        result.score
    );
}

Integration with Detection and Classification

The recognition processor is typically the final stage in an OCR pipeline:

use retto_core::processor::{DetProcessor, ClsProcessor, RecProcessor};

// 1. Detect text regions
let det_results = det_processor.process(image, model_det)?;

// 2. Crop detected regions
let mut crop_images: Vec<ImageHelper> = det_results.0
    .iter()
    .map(|det| crop_region(image, &det.boxes))
    .collect();

// 3. Classify and correct orientation
let cls_results = cls_processor.process(&mut crop_images, model_cls)?;

// 4. Recognize text
let rec_results = rec_processor.process(&crop_images, model_rec)?;

// 5. Combine results
for (det, rec) in det_results.0.iter().zip(rec_results.0.iter()) {
    println!("Text at {:?}: '{}' (confidence: {:.2})",
        det.boxes,
        rec.text,
        rec.score
    );
}

Character Dictionaries

Standard Dictionaries

ppocr_keys_v1.txt: Default Chinese + English + numbers (6623 characters)

Includes simplified Chinese characters
English letters (uppercase and lowercase)
Numbers and common punctuation
Suitable for general Chinese OCR

Custom Dictionaries

Create custom dictionaries for specific use cases:

# English letters only
A
B
C
...
Z
a
b
c
...
z

# Numbers only
0
1
2
3
4
5
6
7
8
9

# Specific domain (e.g., license plates)
A
B
C
...
京
沪
粤
...

Loading Custom Dictionaries

// From local file
let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Path("./custom_dict.txt".into())
    ),
    ..Default::default()
};

// From Hugging Face Hub
let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::HuggingFace {
            repo: "my-org/my-models".into(),
            model: "dicts/my_custom_dict.txt".into(),
        }
    ),
    ..Default::default()
};

// Embedded in binary
let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Blob(
            include_bytes!("../dicts/custom.txt").to_vec()
        )
    ),
    ..Default::default()
};

Performance Considerations

batch_num: Larger batches improve throughput but require more memory. Adjust based on your hardware.
image_shape:
- Height: Smaller heights (e.g., 32) are faster but may reduce accuracy for complex characters
- Width: Wider images (e.g., 320-640) support longer text sequences
Dictionary Size: Smaller dictionaries (e.g., English only) are faster and more accurate than large dictionaries (e.g., full CJK)
Aspect Ratio Sorting: The processor automatically sorts images by aspect ratio to minimize padding waste

Common Use Cases

Chinese OCR

let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Path("ppocr_keys_v1.txt".into())
    ),
    image_shape: [3, 48, 320],
    batch_num: 6,
};

English-Only OCR

let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Path("english_dict.txt".into())
    ),
    image_shape: [3, 32, 256],  // Smaller for Latin characters
    batch_num: 8,
};

Number Recognition (e.g., Credit Cards)

let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Path("digits_dict.txt".into())
    ),
    image_shape: [3, 32, 192],  // Smaller for simple digits
    batch_num: 16,  // Larger batches for simple cases
};

License Plate Recognition

let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Path("license_plate_dict.txt".into())
    ),
    image_shape: [3, 48, 168],  // License plates are short
    batch_num: 12,
};

Error Handling

Common errors and solutions:

Dictionary Not Found

// Error: Failed to load character dictionary
// Solution: Check that the dictionary file exists and is readable
let config = RecProcessorConfig {
    character_source: RecCharacterDictProvider::OutSide(
        RettoWorkerModelSource::Path("./dicts/ppocr_keys_v1.txt".into())
    ),
    ..Default::default()
};

Invalid UTF-8 in Dictionary

// Error: Invalid UTF-8 in dictionary blob
// Solution: Ensure dictionary file is UTF-8 encoded
// Use a text editor to re-save with UTF-8 encoding

Model Output Shape Mismatch

// Error: Model output classes don't match dictionary size
// Solution: Ensure your model is trained with the same dictionary
// Dictionary size = number of characters + 1 (for blank token)

Core API

Processors

Workers

CLI

WebAssembly

​Overview

​RecProcessor

​Constructor

​Process Method

​RecProcessorConfig

​Fields

​Default Configuration

​Example

​RecCharacterDictProvider

​RettoWorkerModelSource

​RecCharacter

​Constructor

​Dictionary Format

​Decode Method

​RecProcessorResult

​RecProcessorSingleResult

​Processing Pipeline

​CTC Decoding

​Example Decoding

​Example Usage

​Integration with Detection and Classification

​Character Dictionaries

​Standard Dictionaries

​Custom Dictionaries

​Loading Custom Dictionaries

​Performance Considerations

​Common Use Cases

​Chinese OCR

​English-Only OCR

​Number Recognition (e.g., Credit Cards)

​License Plate Recognition

​Error Handling

​Dictionary Not Found

​Invalid UTF-8 in Dictionary

​Model Output Shape Mismatch

Build docs developers (and LLMs) love

Overview

RecProcessor

Constructor

Process Method

RecProcessorConfig

Fields

Default Configuration

Example

RecCharacterDictProvider

RettoWorkerModelSource

RecCharacter

Constructor

Dictionary Format

Decode Method

RecProcessorResult

RecProcessorSingleResult

Processing Pipeline

CTC Decoding

Example Decoding

Example Usage

Integration with Detection and Classification

Character Dictionaries

Standard Dictionaries

Custom Dictionaries

Loading Custom Dictionaries

Performance Considerations

Common Use Cases

Chinese OCR

English-Only OCR

Number Recognition (e.g., Credit Cards)

License Plate Recognition

Error Handling

Dictionary Not Found

Invalid UTF-8 in Dictionary

Model Output Shape Mismatch