Skip to main content

Overview

Base options classes that control Docling’s document processing pipelines. All pipeline-specific options inherit from PipelineOptions and extend it with additional capabilities.

PipelineOptions

Base configuration for all document processing pipelines.
from docling.datamodel.pipeline_options import PipelineOptions

options = PipelineOptions(
    document_timeout=120.0,
    enable_remote_services=False,
    allow_external_plugins=False
)

Parameters

document_timeout
float | None
default:"None"
Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced.Recommended: 90-120 seconds for production systems.
accelerator_options
AcceleratorOptions
default:"AcceleratorOptions()"
Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.See AcceleratorOptions for details.
enable_remote_services
bool
default:"False"
Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.
allow_external_plugins
bool
default:"False"
Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.
artifacts_path
Path | str | None
default:"None"
Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use.Use docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.

ConvertPipelineOptions

Base configuration for document conversion pipelines.
from docling.datamodel.pipeline_options import ConvertPipelineOptions

options = ConvertPipelineOptions(
    do_picture_description=True,
    do_picture_classification=True
)

Parameters

Inherits all parameters from PipelineOptions.
do_picture_classification
bool
default:"False"
Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.
picture_classification_options
DocumentPictureClassifierOptions
Configuration for picture classification model/runtime. Supports selecting transformers, onnxruntime, or remote api_kserve_v2 inference engines.
do_picture_description
bool
default:"False"
Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.
picture_description_options
PictureDescriptionBaseOptions
Configuration for picture description model. Uses new preset system (recommended).Default: ‘smolvlm’ presetExample: PictureDescriptionVlmEngineOptions.from_preset('granite_vision')
do_chart_extraction
bool
default:"False"
Extract data in tabular format from bar charts, pie charts, and line charts.

PdfPipelineOptions

Configuration options for the PDF document processing pipeline.
from docling.datamodel.pipeline_options import PdfPipelineOptions

options = PdfPipelineOptions(
    do_ocr=True,
    do_table_structure=True,
    ocr_options=EasyOcrOptions(lang=["en"])
)

Parameters

Inherits all parameters from ConvertPipelineOptions.
do_table_structure
bool
default:"True"
Enable table structure extraction and reconstruction. Detects table regions, extracts cell content with row/column relationships, and reconstructs the logical table structure for downstream processing.
do_ocr
bool
default:"True"
Enable Optical Character Recognition for scanned or image-based PDFs. Replaces or supplements programmatic text extraction with OCR-detected text. Required for scanned documents with no embedded text layer.Note: OCR significantly increases processing time.
do_code_enrichment
bool
default:"False"
Enable specialized processing for code blocks. Applies code-aware OCR and formatting to improve accuracy of programming language snippets, terminal output, and structured code content.
do_formula_enrichment
bool
default:"False"
Enable mathematical formula recognition and LaTeX conversion. Uses specialized models to detect and extract mathematical expressions, converting them to LaTeX format for accurate representation.
force_backend_text
bool
default:"False"
Force use of PDF backend’s native text extraction instead of layout model predictions. When enabled, bypasses the layout model’s text detection and uses the embedded text from the PDF file directly.Useful for PDFs with reliable programmatic text layers.
table_structure_options
BaseTableStructureOptions
default:"TableStructureOptions()"
Configuration for table structure extraction. Controls table detection accuracy, cell matching behavior, and table formatting.Only applicable when do_table_structure=True. See TableStructureOptions.
ocr_options
OcrOptions
default:"OcrAutoOptions()"
Configuration for OCR engine. Specifies which OCR engine to use (Tesseract, EasyOCR, RapidOCR, etc.) and engine-specific settings.Only applicable when do_ocr=True. See OcrOptions.
layout_options
BaseLayoutOptions
default:"LayoutOptions()"
Configuration for document layout analysis model. Controls layout detection behavior including cluster creation for orphaned elements, cell assignment to table structures, and handling of empty regions.Specifies which layout model to use (default: Heron).
code_formula_options
CodeFormulaVlmOptions
Configuration for code and formula extraction using VLM. Uses new preset system (recommended).Default: ‘codeformulav2’ presetOnly applicable when do_code_enrichment=True or do_formula_enrichment=True.
images_scale
float
default:"1.0"
Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements.Recommended values:
  • 1.0 (standard quality)
  • 2.0 (high resolution)
  • 0.5 (lower resolution for previews)
generate_page_images
bool
default:"False"
Generate rendered page images during extraction. Creates PNG representations of each page for visual preview, validation, or downstream image-based machine learning tasks.
generate_picture_images
bool
default:"False"
Extract and save embedded images from the PDF. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.
generate_parsed_pages
bool
default:"False"
Retain intermediate parsed page representations after processing. When enabled, keeps detailed page-level parsing data structures for debugging or advanced post-processing.Increases memory usage. Automatically disabled after document assembly unless explicitly enabled.

Batching Options (Threaded Pipeline)

ocr_batch_size
int
default:"4"
Batch size for OCR processing stage in threaded pipeline. Pages are grouped and processed together to improve throughput. Higher values increase GPU/CPU utilization but require more memory.Only used by StandardPdfPipeline (threaded mode).
layout_batch_size
int
default:"4"
Batch size for layout analysis stage in threaded pipeline. Pages are grouped and processed together by the layout model. Higher values improve GPU utilization but require more memory.Only used by StandardPdfPipeline (threaded mode).
table_structure_batch_size
int
default:"4"
Batch size for table structure extraction stage in threaded pipeline.Only used by StandardPdfPipeline (threaded mode).

PaginatedPipelineOptions

Configuration for pipelines processing paginated documents.
from docling.datamodel.pipeline_options import PaginatedPipelineOptions

options = PaginatedPipelineOptions(
    images_scale=2.0,
    generate_page_images=True
)

Parameters

Inherits all parameters from ConvertPipelineOptions.
images_scale
float
default:"1.0"
Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements.
generate_page_images
bool
default:"False"
Generate rendered page images during extraction.
generate_picture_images
bool
default:"False"
Extract and save embedded images from the document.

VlmPipelineOptions

Pipeline configuration for vision-language model based document processing.
from docling.datamodel.pipeline_options import VlmPipelineOptions, VlmConvertOptions

options = VlmPipelineOptions(
    vlm_options=VlmConvertOptions.from_preset("smoldocling")
)

Parameters

Inherits all parameters from PaginatedPipelineOptions.
generate_page_images
bool
default:"True"
Generate page images for VLM processing. Required for vision-language models to analyze document pages. Automatically enabled in VLM pipeline.
force_backend_text
bool
default:"False"
Force use of backend’s native text extraction instead of VLM predictions. When enabled, bypasses VLM text detection and uses embedded text from the document directly.
vlm_options
VlmConvertOptions | InlineVlmOptions | ApiVlmOptions
Vision-Language Model configuration for document understanding. Uses new VlmConvertOptions with preset system (recommended).Default: ‘granite_docling’ presetExample: VlmConvertOptions.from_preset('smoldocling')Legacy InlineVlmOptions/ApiVlmOptions still supported.

AsrPipelineOptions

Configuration options for the Automatic Speech Recognition (ASR) pipeline.
from docling.datamodel.pipeline_options import AsrPipelineOptions
from docling.datamodel import asr_model_specs

options = AsrPipelineOptions(
    asr_options=asr_model_specs.WHISPER_TINY
)
This pipeline processes audio files and converts speech to text using Whisper-based models. Supports various audio formats (MP3, WAV, FLAC, etc.) and video files with audio tracks.

Parameters

Inherits all parameters from PipelineOptions.
asr_options
InlineAsrOptions
default:"asr_model_specs.WHISPER_TINY"
Automatic Speech Recognition (ASR) model configuration for audio transcription. Specifies which ASR model to use (e.g., Whisper variants) and model-specific parameters for speech-to-text conversion.

Notes

Production Best Practices
  • Enabling multiple features (OCR, table structure, formulas) increases processing time significantly. Enable only necessary features for your use case.
  • For production systems processing large document volumes, implement timeout protection (90-120 seconds via document_timeout parameter).
  • OCR requires system installation of engines (Tesseract, EasyOCR). Verify installation before enabling OCR via do_ocr=True.
  • RapidOCR has known issues with read-only filesystems (e.g., Databricks). Consider Tesseract or alternative backends for distributed systems.

See Also

Build docs developers (and LLMs) love