Skip to main content

Overview

Retto provides extensive configuration options to customize OCR behavior. You can adjust:
  • Image size limits for processing
  • Detection threshold and sensitivity
  • Classification threshold for rotation detection
  • Recognition batch size and image dimensions
  • Hardware acceleration (CPU, CUDA, DirectML)

Configuration Structure

The RettoSessionConfig contains all configurable parameters:
pub struct RettoSessionConfig<W: RettoWorker> {
    // Worker configuration (models, device)
    pub worker_config: W::RettoWorkerConfig,
    
    // Image size limits
    pub max_side_len: usize,
    pub min_side_len: usize,
    
    // Stage-specific configurations
    pub det_processor_config: DetProcessorConfig,
    pub cls_processor_config: ClsProcessorConfig,
    pub rec_processor_config: RecProcessorConfig,
}

Basic Custom Configuration

Here’s a simple example with custom settings:
use retto_core::prelude::*;
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig::default(),
        // Increase max size for high-resolution images
        max_side_len: 3000,
        // Increase min size to filter very small text
        min_side_len: 50,
        // Use default processor configs
        det_processor_config: DetProcessorConfig::default(),
        cls_processor_config: ClsProcessorConfig::default(),
        rec_processor_config: RecProcessorConfig::default(),
    };

    let mut session = RettoSession::new(cfg)?;
    let img_bytes = fs::read("high_res_document.png")?;
    let result = session.run(img_bytes)?;

    println!("Detected {} text regions", result.det_result.0.len());
    Ok(())
}

Image Size Configuration

max_side_len and min_side_len

These parameters control how Retto resizes input images before processing:
let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    // Maximum length for the longest side (default: 2000)
    max_side_len: 4000,
    // Minimum length for the shortest side (default: 30)
    min_side_len: 100,
    ..Default::default()
};
Guidelines:
  • max_side_len: Higher values preserve detail but increase processing time
    • Use 2000-3000 for standard documents
    • Use 4000-6000 for high-resolution scans
    • Use 1000-1500 for fast processing of low-quality images
  • min_side_len: Filters out very small text regions
    • Use 30-50 for standard text
    • Use 100+ to ignore small noise

Detection Configuration

Customize text detection parameters:
use retto_core::prelude::*;
use ndarray::Array1;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let det_config = DetProcessorConfig {
        // Input image processing
        limit_side_len: 960,  // Resize limit (default: 736)
        limit_type: LimitType::Max,  // Limit longest side
        
        // Detection thresholds
        threch: 0.25,         // Pixel threshold (default: 0.3)
        box_thresh: 0.45,     // Box confidence (default: 0.5)
        
        // Post-processing
        unclip_ratio: 1.8,    // Text box expansion (default: 1.6)
        use_dilation: true,   // Dilate detection mask
        min_mini_box_size: 5, // Min box size (default: 3)
        max_candidates: 1500, // Max detections (default: 1000)
        
        // Advanced settings
        score_mode: ScoreMode::Fast,  // Fast or Slow scoring
        mean: Array1::from_elem(3, 0.5),
        std: Array1::from_elem(3, 0.5),
        scale: 1.0 / 255.0,
        dilation_kernel: Some(Array2::from_elem((2, 2), 1)),
    };

    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig::default(),
        det_processor_config: det_config,
        ..Default::default()
    };

    let mut session = RettoSession::new(cfg)?;
    Ok(())
}

Key Detection Parameters

threch
f32
default:"0.3"
Pixel-level threshold for text detection. Lower values detect more text but may include noise.
  • 0.2-0.25: Aggressive detection, more false positives
  • 0.3-0.35: Balanced (recommended)
  • 0.4-0.5: Conservative, fewer false positives
box_thresh
f32
default:"0.5"
Minimum confidence score for a detected text box to be accepted.
  • 0.3-0.4: Accept more uncertain detections
  • 0.5: Balanced confidence
  • 0.6-0.7: High confidence only
unclip_ratio
f32
default:"1.6"
Expansion ratio for detected text boxes. Higher values include more context around text.
  • 1.2-1.4: Tight boxes
  • 1.6: Standard (recommended)
  • 1.8-2.0: Loose boxes with more padding
min_mini_box_size
usize
default:"3"
Minimum side length for detected boxes (in pixels). Filters out very small detections.

Classification Configuration

Customize rotation detection:
use retto_core::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cls_config = ClsProcessorConfig {
        // Input image dimensions for classification
        image_shape: [3, 48, 192],  // [channels, height, width]
        
        // Processing batch size
        batch_num: 8,  // Process 8 images at once (default: 6)
        
        // Rotation confidence threshold
        thresh: 0.85,  // Lower = more aggressive rotation (default: 0.9)
        
        // Possible rotation angles
        label: vec![0, 180],  // 0° or 180° rotation
    };

    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig::default(),
        cls_processor_config: cls_config,
        ..Default::default()
    };

    let mut session = RettoSession::new(cfg)?;
    Ok(())
}

Key Classification Parameters

thresh
f32
default:"0.9"
Confidence threshold for applying 180° rotation. Only rotate if confidence exceeds this value.
  • 0.7-0.8: Aggressive rotation correction
  • 0.9: Balanced (recommended)
  • 0.95+: Conservative, only rotate if very confident
batch_num
usize
default:"6"
Number of images to classify in parallel. Higher values use more memory but may be faster.
  • 2-4: Low memory usage
  • 6-8: Balanced
  • 10-16: High throughput (requires more RAM)

Recognition Configuration

Customize text recognition:
use retto_core::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let rec_config = RecProcessorConfig {
        // Character dictionary source
        character_source: RecCharacterDictProvider::OutSide(
            RettoWorkerModelSource::HuggingFace {
                repo: "pk5ls20/PaddleModel".into(),
                model: "retto/onnx/ppocr_keys_v1.txt".to_string(),
            }
        ),
        
        // Input image dimensions for recognition
        image_shape: [3, 48, 320],  // [channels, height, width]
        
        // Processing batch size
        batch_num: 8,  // Process 8 images at once (default: 6)
    };

    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig::default(),
        rec_processor_config: rec_config,
        ..Default::default()
    };

    let mut session = RettoSession::new(cfg)?;
    Ok(())
}

Custom Character Dictionary

Use a custom character dictionary for recognition:
use retto_core::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load from local file
    let rec_config = RecProcessorConfig {
        character_source: RecCharacterDictProvider::OutSide(
            RettoWorkerModelSource::Path("custom_dict.txt".into())
        ),
        image_shape: [3, 48, 320],
        batch_num: 6,
    };

    // Or load from Hugging Face
    let rec_config_hf = RecProcessorConfig {
        character_source: RecCharacterDictProvider::OutSide(
            RettoWorkerModelSource::HuggingFace {
                repo: "your-username/your-model".into(),
                model: "path/to/dict.txt".to_string(),
            }
        ),
        image_shape: [3, 48, 320],
        batch_num: 6,
    };

    let cfg = RettoSessionConfig {
        worker_config: RettoOrtWorkerConfig::default(),
        rec_processor_config: rec_config,
        ..Default::default()
    };

    let mut session = RettoSession::new(cfg)?;
    Ok(())
}

Hardware Acceleration

CPU (Default)

use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default();
};

CUDA (NVIDIA GPU)

use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::Cuda(0),  // GPU device ID
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

DirectML (Windows)

use retto_core::prelude::*;

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::DirectML(0),  // Device ID
        models: RettoOrtWorkerModelProvider::default(),
    },
    ..Default::default()
};

Complete Custom Configuration Example

Here’s a complete example with all customizations:
use retto_core::prelude::*;
use ndarray::Array1;
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Configure detection
    let det_config = DetProcessorConfig {
        limit_side_len: 960,
        limit_type: LimitType::Max,
        threch: 0.25,
        box_thresh: 0.45,
        unclip_ratio: 1.8,
        use_dilation: true,
        min_mini_box_size: 5,
        max_candidates: 1500,
        score_mode: ScoreMode::Fast,
        mean: Array1::from_elem(3, 0.5),
        std: Array1::from_elem(3, 0.5),
        scale: 1.0 / 255.0,
        dilation_kernel: Some(Array2::from_elem((2, 2), 1)),
    };

    // Configure classification
    let cls_config = ClsProcessorConfig {
        image_shape: [3, 48, 192],
        batch_num: 8,
        thresh: 0.85,
        label: vec![0, 180],
    };

    // Configure recognition
    let rec_config = RecProcessorConfig {
        character_source: RecCharacterDictProvider::OutSide(
            RettoWorkerModelSource::HuggingFace {
                repo: "pk5ls20/PaddleModel".into(),
                model: "retto/onnx/ppocr_keys_v1.txt".to_string(),
            }
        ),
        image_shape: [3, 48, 320],
        batch_num: 8,
    };

    // Configure worker (use CUDA if available)
    #[cfg(feature = "backend-ort-cuda")]
    let worker_config = RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::Cuda(0),
        models: RettoOrtWorkerModelProvider::default(),
    };

    #[cfg(not(feature = "backend-ort-cuda"))]
    let worker_config = RettoOrtWorkerConfig {
        device: RettoOrtWorkerDevice::CPU,
        models: RettoOrtWorkerModelProvider::default(),
    };

    // Create session with custom configuration
    let cfg = RettoSessionConfig {
        worker_config,
        max_side_len: 3000,
        min_side_len: 50,
        det_processor_config: det_config,
        cls_processor_config: cls_config,
        rec_processor_config: rec_config,
    };

    let mut session = RettoSession::new(cfg)?;

    // Process image
    let img_bytes = fs::read("document.png")?;
    let result = session.run(img_bytes)?;

    println!("Processed with custom configuration:");
    println!("  Detected regions: {}", result.det_result.0.len());
    println!("  Recognized text: {} items", result.rec_result.0.len());

    for (idx, rec) in result.rec_result.0.iter().enumerate() {
        println!("  {}. {} ({:.1}%)", idx + 1, rec.text, rec.score * 100.0);
    }

    Ok(())
}

Configuration Presets

Here are recommended presets for common use cases:

High Accuracy (Slow)

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    max_side_len: 4000,
    min_side_len: 30,
    det_processor_config: DetProcessorConfig {
        limit_side_len: 960,
        threch: 0.25,
        box_thresh: 0.45,
        unclip_ratio: 1.8,
        score_mode: ScoreMode::Slow,  // More accurate scoring
        ..Default::default()
    },
    cls_processor_config: ClsProcessorConfig {
        thresh: 0.85,
        batch_num: 4,
        ..Default::default()
    },
    rec_processor_config: RecProcessorConfig {
        batch_num: 4,
        ..Default::default()
    },
};

Fast Processing (Lower Accuracy)

let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    max_side_len: 1500,
    min_side_len: 50,
    det_processor_config: DetProcessorConfig {
        limit_side_len: 640,
        threch: 0.35,
        box_thresh: 0.55,
        unclip_ratio: 1.5,
        score_mode: ScoreMode::Fast,
        ..Default::default()
    },
    cls_processor_config: ClsProcessorConfig {
        thresh: 0.9,
        batch_num: 12,
        ..Default::default()
    },
    rec_processor_config: RecProcessorConfig {
        batch_num: 12,
        ..Default::default()
    },
};
let cfg = RettoSessionConfig {
    worker_config: RettoOrtWorkerConfig::default(),
    max_side_len: 2000,
    min_side_len: 30,
    det_processor_config: DetProcessorConfig::default(),
    cls_processor_config: ClsProcessorConfig::default(),
    rec_processor_config: RecProcessorConfig::default(),
};

Next Steps

Basic OCR

Learn the fundamentals of OCR with Retto

Streaming OCR

Process stages with real-time callbacks

Build docs developers (and LLMs) love