Overview
The Recognition Processor (RecProcessor) performs text recognition (OCR) on text images. It converts images of text into readable strings using character dictionaries and CTC (Connectionist Temporal Classification) decoding.
Source: retto-core/src/processor/rec_processor.rs
RecProcessor
The main recognition processor that extracts text from images.
Constructor
pub fn new(config: &RecProcessorConfig, character: &RecCharacter) -> Self
config
&RecProcessorConfig
required
Recognition processor configuration
Character dictionary for text decoding. Maps model outputs to text characters.
Process Method
fn process<F>(
&self,
images: &Vec<ImageHelper>,
worker_fun: F,
) -> RettoResult<RecProcessorResult>
where
F: FnMut(Array4<f32>) -> RettoResult<Array3<f32>>
Processes a batch of text images to extract recognized text.
images
&Vec<ImageHelper>
required
Immutable reference to a vector of images to recognize. These should be cropped text regions (e.g., from detection results).
Worker function that runs model inference on preprocessed batches. Takes a 4D tensor (batch × channels × height × width) and returns a 3D tensor (batch × sequence × classes).
Recognition results containing recognized text and confidence scores for each image
RecProcessorConfig
Configuration structure for the recognition processor.
Fields
character_source
RecCharacterDictProvider
required
Identifies the provider of the model dictionary source. Specifies where to load the character dictionary from.
image_shape
[usize; 3]
default:"[3, 48, 320]"
Image size during recognition as [channels, height, width]. Text images are resized to this shape before processing.
Batch size for recognition. Images are processed in batches of this size for efficiency.
Default Configuration
The default configuration varies by platform and features:
// With hf-hub feature (Hugging Face Hub)
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::HuggingFace {
repo: "pk5ls20/PaddleModel",
model: "retto/onnx/ppocr_keys_v1.txt",
}
),
image_shape: [3, 48, 320],
batch_num: 6,
};
// Without hf-hub feature (local file)
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Path("ppocr_keys_v1.txt".into())
),
image_shape: [3, 48, 320],
batch_num: 6,
};
// WebAssembly with download-models feature (embedded)
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Blob(
include_bytes!("../../models/ppocr_keys_v1.txt").to_vec()
)
),
image_shape: [3, 48, 320],
batch_num: 6,
};
Example
use retto_core::processor::RecProcessorConfig;
use retto_core::processor::RecCharacterDictProvider;
use retto_core::worker::RettoWorkerModelSource;
// Use default configuration
let config = RecProcessorConfig::default();
// Custom configuration with local dictionary
let custom_config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Path("./my_dict.txt".into())
),
image_shape: [3, 32, 256],
batch_num: 8,
};
// Configuration with embedded dictionary
let embedded_config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Blob(
include_bytes!("../dicts/chinese.txt").to_vec()
)
),
image_shape: [3, 48, 320],
batch_num: 6,
};
RecCharacterDictProvider
Enum specifying the source of the character dictionary.
pub enum RecCharacterDictProvider {
OutSide(RettoWorkerModelSource),
Inline(), // TODO: Load from ONNX model
}
Load dictionary from external sources (file, URL, or embedded blob). Requires a RettoWorkerModelSource parameter.
Load dictionary from the ONNX model itself (not yet implemented).
RettoWorkerModelSource
Specifies the actual source location:
pub enum RettoWorkerModelSource {
// Load from local file path
Path(PathBuf),
// Download from Hugging Face Hub
HuggingFace { repo: String, model: String },
// Load from embedded byte array
Blob(Vec<u8>),
}
RecCharacter
Internal structure managing the character dictionary and decoding logic.
Constructor
pub fn new(
dict: RecCharacterDictProvider,
ignored_tokens: Vec<usize>,
) -> RettoResult<Self>
dict
RecCharacterDictProvider
required
Source provider for the character dictionary
List of token indices to ignore during decoding (e.g., padding tokens)
Character dictionaries are plain text files with one character per line:
The processor automatically adds special characters:
"blank" is inserted at index 0 (CTC blank token)
" " (space) is appended at the end
Decode Method
fn decode(
&self,
text_index: &Array2<usize>,
text_prob: &Array2<f32>,
wh_ratio_list: &[OrderedFloat<f32>],
max_wh_ratio: OrderedFloat<f32>,
remove_duplicate: bool,
return_word_box: bool, // Not yet implemented
) -> Vec<(String, f32)>
Internal method that decodes model outputs into text strings.
RecProcessorResult
Result structure containing recognition results for all processed images.
pub struct RecProcessorResult(pub Vec<RecProcessorSingleResult>);
0
Vec<RecProcessorSingleResult>
Vector of recognition results, one per input image in the same order as input
RecProcessorSingleResult
Recognition result for a single image.
pub struct RecProcessorSingleResult {
pub text: String,
pub score: f32,
}
The recognized text string extracted from the image
Average confidence score for the recognized text (0.0 to 1.0). Calculated as the mean of all character probabilities. Higher values indicate more confident recognition.
Processing Pipeline
The recognition processor follows this pipeline:
-
Batch Preparation:
- Sort images by aspect ratio (width/height) in descending order
- Group images into batches of size
batch_num
- Calculate maximum width/height ratio for each batch
-
Preprocessing (per batch):
- Resize each image to fit
image_shape while maintaining aspect ratio
- Images shorter than the target width are padded to match the widest image in the batch
- Normalize pixel values
- Stack images into a batch tensor (4D array)
-
Model Inference:
- Pass preprocessed batch to the recognition model via
worker_fun
- Model outputs character probability sequence (3D array: batch × sequence × classes)
-
Postprocessing (CTC Decoding):
- For each image in the batch:
- Find the most likely character at each time step (argmax)
- Remove duplicate consecutive characters
- Remove blank tokens (index 0)
- Remove ignored tokens
- Map token indices to characters using the dictionary
- Calculate average confidence score
- Store results maintaining original input order
CTC Decoding
The processor uses CTC (Connectionist Temporal Classification) decoding:
- Argmax Selection: Select the most probable character at each time step
- Duplicate Removal: Consecutive identical characters are merged (“hello” from “hheelllloo”)
- Blank Removal: CTC blank tokens are removed
- Character Mapping: Token indices are mapped to actual characters
Example Decoding
Model output indices: [0, 15, 15, 8, 0, 12, 12, 0, 0]
After blank removal: [15, 15, 8, 12, 12]
After dedup: [15, 8, 12]
Mapped to characters: ['h', 'e', 'l']
Final text: "hel"
Example Usage
use retto_core::processor::{RecProcessor, RecProcessorConfig, RecCharacter};
use retto_core::processor::RecCharacterDictProvider;
use retto_core::image_helper::ImageHelper;
// Create configuration
let config = RecProcessorConfig::default();
// Load character dictionary
let character = RecCharacter::new(
config.character_source.clone(),
vec![], // No ignored tokens
)?;
// Create processor
let processor = RecProcessor::new(&config, &character);
// Prepare images (e.g., cropped and oriented text regions)
let images: Vec<ImageHelper> = vec![/* ... */];
// Process images with model inference function
let results = processor.process(&images, |batch| {
// Run your recognition model inference here
model.run(batch)
})?;
// Access recognition results
for (i, result) in results.0.iter().enumerate() {
println!("Image {}: text = '{}', confidence = {:.2}",
i,
result.text,
result.score
);
}
Integration with Detection and Classification
The recognition processor is typically the final stage in an OCR pipeline:
use retto_core::processor::{DetProcessor, ClsProcessor, RecProcessor};
// 1. Detect text regions
let det_results = det_processor.process(image, model_det)?;
// 2. Crop detected regions
let mut crop_images: Vec<ImageHelper> = det_results.0
.iter()
.map(|det| crop_region(image, &det.boxes))
.collect();
// 3. Classify and correct orientation
let cls_results = cls_processor.process(&mut crop_images, model_cls)?;
// 4. Recognize text
let rec_results = rec_processor.process(&crop_images, model_rec)?;
// 5. Combine results
for (det, rec) in det_results.0.iter().zip(rec_results.0.iter()) {
println!("Text at {:?}: '{}' (confidence: {:.2})",
det.boxes,
rec.text,
rec.score
);
}
Character Dictionaries
Standard Dictionaries
ppocr_keys_v1.txt: Default Chinese + English + numbers (6623 characters)
- Includes simplified Chinese characters
- English letters (uppercase and lowercase)
- Numbers and common punctuation
- Suitable for general Chinese OCR
Custom Dictionaries
Create custom dictionaries for specific use cases:
# English letters only
A
B
C
...
Z
a
b
c
...
z
# Numbers only
0
1
2
3
4
5
6
7
8
9
# Specific domain (e.g., license plates)
A
B
C
...
京
沪
粤
...
Loading Custom Dictionaries
// From local file
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Path("./custom_dict.txt".into())
),
..Default::default()
};
// From Hugging Face Hub
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::HuggingFace {
repo: "my-org/my-models".into(),
model: "dicts/my_custom_dict.txt".into(),
}
),
..Default::default()
};
// Embedded in binary
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Blob(
include_bytes!("../dicts/custom.txt").to_vec()
)
),
..Default::default()
};
batch_num: Larger batches improve throughput but require more memory. Adjust based on your hardware.
image_shape:
- Height: Smaller heights (e.g., 32) are faster but may reduce accuracy for complex characters
- Width: Wider images (e.g., 320-640) support longer text sequences
- Dictionary Size: Smaller dictionaries (e.g., English only) are faster and more accurate than large dictionaries (e.g., full CJK)
- Aspect Ratio Sorting: The processor automatically sorts images by aspect ratio to minimize padding waste
Common Use Cases
Chinese OCR
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Path("ppocr_keys_v1.txt".into())
),
image_shape: [3, 48, 320],
batch_num: 6,
};
English-Only OCR
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Path("english_dict.txt".into())
),
image_shape: [3, 32, 256], // Smaller for Latin characters
batch_num: 8,
};
Number Recognition (e.g., Credit Cards)
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Path("digits_dict.txt".into())
),
image_shape: [3, 32, 192], // Smaller for simple digits
batch_num: 16, // Larger batches for simple cases
};
License Plate Recognition
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Path("license_plate_dict.txt".into())
),
image_shape: [3, 48, 168], // License plates are short
batch_num: 12,
};
Error Handling
Common errors and solutions:
Dictionary Not Found
// Error: Failed to load character dictionary
// Solution: Check that the dictionary file exists and is readable
let config = RecProcessorConfig {
character_source: RecCharacterDictProvider::OutSide(
RettoWorkerModelSource::Path("./dicts/ppocr_keys_v1.txt".into())
),
..Default::default()
};
Invalid UTF-8 in Dictionary
// Error: Invalid UTF-8 in dictionary blob
// Solution: Ensure dictionary file is UTF-8 encoded
// Use a text editor to re-save with UTF-8 encoding
Model Output Shape Mismatch
// Error: Model output classes don't match dictionary size
// Solution: Ensure your model is trained with the same dictionary
// Dictionary size = number of characters + 1 (for blank token)