Overview
OCR (Optical Character Recognition) options configure text extraction from images and scanned documents. Docling supports multiple OCR engines, each with different characteristics in terms of accuracy, speed, language support, and platform compatibility.OcrOptions (Base)
Base class for all OCR engine configurations.Parameters
List of OCR languages to use. The format must match the values of the OCR engine of choice.Example:
["deu", "eng"]If enabled, a full-page OCR is always applied, even when programmatic text is available.
Percentage of the page area for a bitmap to be processed with OCR. Range: 0.0-1.0.Examples:
0.05, 0.1OcrAutoOptions
Automatic OCR engine selection based on system availability.Parameters
OCR engine type identifier.
The automatic OCR engine will use the default values of the selected engine. Specify the engine explicitly to change language selection.
RapidOcrOptions
Configuration for RapidOCR engine with multiple backend support.Parameters
OCR engine type identifier.
List of OCR languages. Note: RapidOCR does not currently support language selection; this parameter is reserved for future compatibility.
Inference backend for RapidOCR:
onnxruntime- Default, cross-platformopenvino- Intel optimizationpaddle- PaddlePaddletorch- PyTorch
Minimum confidence score for text detection. Text regions with scores below this threshold are filtered out.Range: 0.0-1.0. Lower values detect more text but may include false positives.
Enable text detection stage. If None, uses RapidOCR default behavior.
Enable text direction classification stage. If None, uses RapidOCR default behavior.
Enable text recognition stage. If None, uses RapidOCR default behavior.
Enable verbose logging output from RapidOCR for debugging purposes.
Custom path to text detection model. If None, uses default RapidOCR model.
Custom path to text classification model. If None, uses default RapidOCR model.
Custom path to text recognition model. If None, uses default RapidOCR model.
Custom path to recognition keys file. If None, uses default RapidOCR keys.
Custom path to font file for text rendering in visualization.
Additional parameters to pass through to RapidOCR engine. Use this to override or extend default RapidOCR configuration with engine-specific options.
References
EasyOcrOptions
Configuration for EasyOCR engine.Parameters
OCR engine type identifier.
List of language codes for OCR. EasyOCR supports 80+ languages. Use ISO 639-1 codes (e.g.,
en, fr, de).Multiple languages can be specified for multilingual documents.Enable GPU acceleration for EasyOCR. If None, automatically detects and uses GPU if available. Set to False to force CPU-only processing.
Minimum confidence score for text recognition. Text with confidence below this threshold is filtered out.Range: 0.0-1.0. Lower values include more text but may reduce accuracy.
Directory path for storing downloaded EasyOCR models. If None, uses default EasyOCR cache location.Useful for offline environments or custom model management.
Recognition network architecture to use:
standard- Default, balanced performancecraft- Higher accuracy
Allow automatic download of EasyOCR models on first use. Disable for offline environments where models must be pre-installed.
Suppress Metal Performance Shaders (MPS) warnings on macOS. Reduces console noise when using Apple Silicon GPUs with EasyOCR.
TesseractCliOcrOptions
Configuration for Tesseract OCR via command-line interface.Parameters
OCR engine type identifier.
List of Tesseract language codes. Use 3-letter ISO 639-2 codes (e.g.,
eng, fra, deu).Multiple languages enable multilingual OCR. Requires corresponding Tesseract language data files.Command or path to Tesseract executable. Use
tesseract if in system PATH, or provide full path for custom installations.Example: /usr/local/bin/tesseractPath to Tesseract data directory containing language files. If None, uses Tesseract’s default TESSDATA_PREFIX location.
Page Segmentation Mode for Tesseract. Values 0-13 control how Tesseract segments the page.Common values:
3- Automatic (default)6- Uniform block of text11- Sparse text
TesseractOcrOptions
Configuration for Tesseract OCR via Python bindings (tesserocr).Parameters
OCR engine type identifier.
List of Tesseract language codes. Use 3-letter ISO 639-2 codes.
Path to Tesseract data directory containing language files.
Page Segmentation Mode for Tesseract. Values 0-13.
OcrMacOptions
Configuration for native macOS OCR using Vision framework.Parameters
OCR engine type identifier.
List of language locale codes for macOS OCR. Use format
language-REGION (e.g., en-US, fr-FR).Leverages native macOS Vision framework for OCR on Apple platforms.Recognition accuracy level:
accurate- Higher quality, slowerfast- Lower quality, faster
macOS framework to use for OCR. Currently supports
vision (Apple Vision framework).Future versions may support additional frameworks.Usage Examples
With Pipeline Options
Switching OCR Engines
Comparison Table
| Engine | Speed | Accuracy | GPU Support | Languages | Platform |
|---|---|---|---|---|---|
| EasyOCR | Medium | High | Yes | 80+ | All |
| Tesseract | Fast | Medium-High | No | 100+ | All |
| RapidOCR | Very Fast | Medium | Backend-dependent | Limited | All |
| OcrMac | Fast | High | Yes (MPS) | macOS supported | macOS only |
Notes
Installation RequirementsMost OCR engines require additional system dependencies:
- Tesseract: Install via system package manager (
apt-get install tesseract-ocr,brew install tesseract) - EasyOCR: Automatically downloads models on first use
- RapidOCR: Requires
rapidocr-onnxruntimeor backend-specific package - OcrMac: Built-in on macOS, no installation needed
See Also
- Pipeline Options - Pipeline configuration
- PDF Backend - PDF processing architecture