Overview
Base options classes that control Docling’s document processing pipelines. All pipeline-specific options inherit fromPipelineOptions and extend it with additional capabilities.
PipelineOptions
Base configuration for all document processing pipelines.Parameters
Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced.Recommended: 90-120 seconds for production systems.
Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.See AcceleratorOptions for details.
Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.
Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.
Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use.Use
docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.ConvertPipelineOptions
Base configuration for document conversion pipelines.Parameters
Inherits all parameters from PipelineOptions.Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.
Configuration for picture classification model/runtime. Supports selecting transformers, onnxruntime, or remote api_kserve_v2 inference engines.
Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.
Configuration for picture description model. Uses new preset system (recommended).Default: ‘smolvlm’ presetExample:
PictureDescriptionVlmEngineOptions.from_preset('granite_vision')Extract data in tabular format from bar charts, pie charts, and line charts.
PdfPipelineOptions
Configuration options for the PDF document processing pipeline.Parameters
Inherits all parameters from ConvertPipelineOptions.Enable table structure extraction and reconstruction. Detects table regions, extracts cell content with row/column relationships, and reconstructs the logical table structure for downstream processing.
Enable Optical Character Recognition for scanned or image-based PDFs. Replaces or supplements programmatic text extraction with OCR-detected text. Required for scanned documents with no embedded text layer.Note: OCR significantly increases processing time.
Enable specialized processing for code blocks. Applies code-aware OCR and formatting to improve accuracy of programming language snippets, terminal output, and structured code content.
Enable mathematical formula recognition and LaTeX conversion. Uses specialized models to detect and extract mathematical expressions, converting them to LaTeX format for accurate representation.
Force use of PDF backend’s native text extraction instead of layout model predictions. When enabled, bypasses the layout model’s text detection and uses the embedded text from the PDF file directly.Useful for PDFs with reliable programmatic text layers.
Configuration for table structure extraction. Controls table detection accuracy, cell matching behavior, and table formatting.Only applicable when
do_table_structure=True. See TableStructureOptions.Configuration for OCR engine. Specifies which OCR engine to use (Tesseract, EasyOCR, RapidOCR, etc.) and engine-specific settings.Only applicable when
do_ocr=True. See OcrOptions.Configuration for document layout analysis model. Controls layout detection behavior including cluster creation for orphaned elements, cell assignment to table structures, and handling of empty regions.Specifies which layout model to use (default: Heron).
Configuration for code and formula extraction using VLM. Uses new preset system (recommended).Default: ‘codeformulav2’ presetOnly applicable when
do_code_enrichment=True or do_formula_enrichment=True.Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements.Recommended values:
- 1.0 (standard quality)
- 2.0 (high resolution)
- 0.5 (lower resolution for previews)
Generate rendered page images during extraction. Creates PNG representations of each page for visual preview, validation, or downstream image-based machine learning tasks.
Extract and save embedded images from the PDF. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.
Retain intermediate parsed page representations after processing. When enabled, keeps detailed page-level parsing data structures for debugging or advanced post-processing.Increases memory usage. Automatically disabled after document assembly unless explicitly enabled.
Batching Options (Threaded Pipeline)
Batch size for OCR processing stage in threaded pipeline. Pages are grouped and processed together to improve throughput. Higher values increase GPU/CPU utilization but require more memory.Only used by
StandardPdfPipeline (threaded mode).Batch size for layout analysis stage in threaded pipeline. Pages are grouped and processed together by the layout model. Higher values improve GPU utilization but require more memory.Only used by
StandardPdfPipeline (threaded mode).Batch size for table structure extraction stage in threaded pipeline.Only used by
StandardPdfPipeline (threaded mode).PaginatedPipelineOptions
Configuration for pipelines processing paginated documents.Parameters
Inherits all parameters from ConvertPipelineOptions.Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements.
Generate rendered page images during extraction.
Extract and save embedded images from the document.
VlmPipelineOptions
Pipeline configuration for vision-language model based document processing.Parameters
Inherits all parameters from PaginatedPipelineOptions.Generate page images for VLM processing. Required for vision-language models to analyze document pages. Automatically enabled in VLM pipeline.
Force use of backend’s native text extraction instead of VLM predictions. When enabled, bypasses VLM text detection and uses embedded text from the document directly.
Vision-Language Model configuration for document understanding. Uses new
VlmConvertOptions with preset system (recommended).Default: ‘granite_docling’ presetExample: VlmConvertOptions.from_preset('smoldocling')Legacy InlineVlmOptions/ApiVlmOptions still supported.AsrPipelineOptions
Configuration options for the Automatic Speech Recognition (ASR) pipeline.Parameters
Inherits all parameters from PipelineOptions.Automatic Speech Recognition (ASR) model configuration for audio transcription. Specifies which ASR model to use (e.g., Whisper variants) and model-specific parameters for speech-to-text conversion.
Notes
Production Best Practices
- Enabling multiple features (OCR, table structure, formulas) increases processing time significantly. Enable only necessary features for your use case.
- For production systems processing large document volumes, implement timeout protection (90-120 seconds via
document_timeoutparameter). - OCR requires system installation of engines (Tesseract, EasyOCR). Verify installation before enabling OCR via
do_ocr=True. - RapidOCR has known issues with read-only filesystems (e.g., Databricks). Consider Tesseract or alternative backends for distributed systems.
See Also
- OCR Options - OCR engine configuration
- Table Structure Options - Table extraction settings
- Accelerator Options - Hardware acceleration
- PDF Backend Options - PDF parsing configuration