Overview
The Image backend (ImageDocumentBackend) processes standalone image files (JPEG, PNG, TIFF, etc.) by treating them as single-page or multi-page documents. Images are processed through the PDF pipeline with OCR and layout analysis to extract text and structure.
Features
- Multi-format support - JPEG, PNG, TIFF, BMP, WEBP, HEIC, GIF, ICO
- Multi-page TIFF - Handles multi-frame TIFF files
- Thread-safe processing - Eager frame extraction for parallel processing
- OCR integration - Automatic text extraction via pipeline
- Layout analysis - Structure detection using layout models
- Table extraction - Detects and extracts tables from images
- No PDF conversion - Native image processing without intermediate PDF
Supported Formats
Single-frame formats
- JPEG (.jpg, .jpeg)
- PNG (.png)
- BMP (.bmp)
- WEBP (.webp)
- HEIC (.heic)
Multi-frame formats
- TIFF (.tiff, .tif)
- GIF (.gif)
- ICO (.ico)
Usage
Basic Conversion
With Pipeline Options
Multi-page TIFF
Architecture
Image backend implements the paginated backend interface:Page Backend
Each page provides image access:Pipeline Processing
Images are processed through the same pipeline as PDFs:OCR Configuration
Images typically require OCR for text extraction:Image Resolution
Higher resolution improves OCR accuracy:- Low quality scans: Use
images_scale=2.0or higher - High quality images:
images_scale=1.0may suffice - Very large images: Consider downscaling to reduce processing time
Multi-page TIFF Handling
Multi-frame TIFF files are processed with eager frame extraction:- Frames extracted eagerly to avoid PIL thread safety issues
- Each frame is an independent Image object
- Safe for concurrent page processing
Performance Considerations
Resolution vs Speed
Resolution vs Speed
Higher resolution improves OCR accuracy but increases processing time:
images_scale=1.0: Fast, good for high-quality scansimages_scale=2.0: Balanced (recommended)images_scale=3.0+: Slow, for very poor quality scans
Memory Usage
Memory Usage
- Multi-page TIFFs load all frames into memory
- Large images at high scale factors use significant RAM
- Consider processing pages sequentially for memory constraints
OCR Engine Selection
OCR Engine Selection
- EasyOCR: Best accuracy, slower, GPU-accelerated
- Tesseract: Fast, good accuracy, CPU-only
- RapidOCR: Fastest, lower accuracy
Advanced Usage
Batch Image Processing
Extract Specific Regions
Custom Pipeline
Limitations
Troubleshooting
Poor OCR results
Poor OCR results
Solutions:
- Increase image scale:
images_scale=2.0or3.0 - Use better OCR engine: Switch to EasyOCR
- Preprocess image: Enhance contrast, remove noise
- Correct language: Set proper OCR language
Out of memory
Out of memory
Solutions:
- Reduce
images_scale - Process pages sequentially (not parallel)
- Reduce batch sizes
- Use smaller OCR batch sizes
Slow processing
Slow processing
Optimizations:
- Enable GPU acceleration for OCR
- Reduce
images_scaleif acceptable - Use faster OCR engine (RapidOCR)
- Disable unused features (e.g.,
do_table_structure=False)
Multi-page TIFF issues
Multi-page TIFF issues
Check:
- Verify TIFF is actually multi-frame
- Ensure PIL/Pillow can read TIFF format
- Try saving individual frames separately
Best Practices
- Use appropriate resolution:
images_scale=2.0for most scans - Enable OCR: Always set
do_ocr=Truefor images - Set correct language: Configure OCR language for best results
- GPU acceleration: Use GPU for faster OCR if available
- Batch processing: Process multiple images in parallel
- Memory management: Monitor memory usage with large images
See Also
- Backends Overview - Backend architecture
- PDF Backend - PDF processing
- OCR Options - OCR configuration
- Pipeline Options - Pipeline settings