PipelineOptions

Overview

Base options classes that control Docling’s document processing pipelines. All pipeline-specific options inherit from PipelineOptions and extend it with additional capabilities.

PipelineOptions

Base configuration for all document processing pipelines.

from docling.datamodel.pipeline_options import PipelineOptions

options = PipelineOptions(
    document_timeout=120.0,
    enable_remote_services=False,
    allow_external_plugins=False
)

Parameters

document_timeout

float | None

default:"None"

Maximum processing time in seconds before aborting document conversion. When exceeded, the pipeline stops processing and returns partial results with PARTIAL_SUCCESS status. If None, no timeout is enforced.Recommended: 90-120 seconds for production systems.

accelerator_options

AcceleratorOptions

default:"AcceleratorOptions()"

Hardware acceleration configuration for model inference. Controls GPU device selection, memory management, and execution optimization settings for layout, OCR, and table structure models.See AcceleratorOptions for details.

enable_remote_services

bool

default:"False"

Allow pipeline to call external APIs or cloud services during processing. Required for API-based picture description models. Disabled by default for security and offline operation.

allow_external_plugins

bool

default:"False"

Allow loading external third-party plugins for OCR, layout, table structure, or picture description models. Enables custom model implementations via plugin system. Disabled by default for security.

artifacts_path

Path | str | None

default:"None"

Local directory containing pre-downloaded model artifacts (weights, configs). If None, models are fetched from remote sources on first use.Use docling-tools models download to pre-fetch artifacts for offline operation or faster initialization.

ConvertPipelineOptions

Base configuration for document conversion pipelines.

from docling.datamodel.pipeline_options import ConvertPipelineOptions

options = ConvertPipelineOptions(
    do_picture_description=True,
    do_picture_classification=True
)

Parameters

Inherits all parameters from PipelineOptions.

do_picture_classification

bool

default:"False"

Enable picture classification to categorize images by type (photo, diagram, chart, etc.). Useful for downstream processing that requires image type awareness.

picture_classification_options

DocumentPictureClassifierOptions

Configuration for picture classification model/runtime. Supports selecting transformers, onnxruntime, or remote api_kserve_v2 inference engines.

do_picture_description

bool

default:"False"

Enable automatic generation of textual descriptions for pictures using vision-language models. Descriptions are added to the document for accessibility and searchability.

picture_description_options

PictureDescriptionBaseOptions

Configuration for picture description model. Uses new preset system (recommended).Default: ‘smolvlm’ presetExample: PictureDescriptionVlmEngineOptions.from_preset('granite_vision')

do_chart_extraction

bool

default:"False"

Extract data in tabular format from bar charts, pie charts, and line charts.

PdfPipelineOptions

Configuration options for the PDF document processing pipeline.

from docling.datamodel.pipeline_options import PdfPipelineOptions

options = PdfPipelineOptions(
    do_ocr=True,
    do_table_structure=True,
    ocr_options=EasyOcrOptions(lang=["en"])
)

Parameters

Inherits all parameters from ConvertPipelineOptions.

do_table_structure

bool

default:"True"

Enable table structure extraction and reconstruction. Detects table regions, extracts cell content with row/column relationships, and reconstructs the logical table structure for downstream processing.

do_ocr

bool

default:"True"

Enable Optical Character Recognition for scanned or image-based PDFs. Replaces or supplements programmatic text extraction with OCR-detected text. Required for scanned documents with no embedded text layer.Note: OCR significantly increases processing time.

do_code_enrichment

bool

default:"False"

Enable specialized processing for code blocks. Applies code-aware OCR and formatting to improve accuracy of programming language snippets, terminal output, and structured code content.

do_formula_enrichment

bool

default:"False"

Enable mathematical formula recognition and LaTeX conversion. Uses specialized models to detect and extract mathematical expressions, converting them to LaTeX format for accurate representation.

force_backend_text

bool

default:"False"

Force use of PDF backend’s native text extraction instead of layout model predictions. When enabled, bypasses the layout model’s text detection and uses the embedded text from the PDF file directly.Useful for PDFs with reliable programmatic text layers.

table_structure_options

BaseTableStructureOptions

default:"TableStructureOptions()"

Configuration for table structure extraction. Controls table detection accuracy, cell matching behavior, and table formatting.Only applicable when do_table_structure=True. See TableStructureOptions.

ocr_options

OcrOptions

default:"OcrAutoOptions()"

Configuration for OCR engine. Specifies which OCR engine to use (Tesseract, EasyOCR, RapidOCR, etc.) and engine-specific settings.Only applicable when do_ocr=True. See OcrOptions.

layout_options

BaseLayoutOptions

default:"LayoutOptions()"

Configuration for document layout analysis model. Controls layout detection behavior including cluster creation for orphaned elements, cell assignment to table structures, and handling of empty regions.Specifies which layout model to use (default: Heron).

code_formula_options

CodeFormulaVlmOptions

Configuration for code and formula extraction using VLM. Uses new preset system (recommended).Default: ‘codeformulav2’ presetOnly applicable when do_code_enrichment=True or do_formula_enrichment=True.

images_scale

float

default:"1.0"

Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements.Recommended values:

1.0 (standard quality)
2.0 (high resolution)
0.5 (lower resolution for previews)

generate_page_images

bool

default:"False"

Generate rendered page images during extraction. Creates PNG representations of each page for visual preview, validation, or downstream image-based machine learning tasks.

generate_picture_images

bool

default:"False"

Extract and save embedded images from the PDF. Exports individual images (figures, photos, diagrams, charts) found in the document as separate image files for downstream use.

generate_parsed_pages

bool

default:"False"

Retain intermediate parsed page representations after processing. When enabled, keeps detailed page-level parsing data structures for debugging or advanced post-processing.Increases memory usage. Automatically disabled after document assembly unless explicitly enabled.

Batching Options (Threaded Pipeline)

ocr_batch_size

int

default:"4"

Batch size for OCR processing stage in threaded pipeline. Pages are grouped and processed together to improve throughput. Higher values increase GPU/CPU utilization but require more memory.Only used by StandardPdfPipeline (threaded mode).

layout_batch_size

int

default:"4"

Batch size for layout analysis stage in threaded pipeline. Pages are grouped and processed together by the layout model. Higher values improve GPU utilization but require more memory.Only used by StandardPdfPipeline (threaded mode).

table_structure_batch_size

int

default:"4"

Batch size for table structure extraction stage in threaded pipeline.Only used by StandardPdfPipeline (threaded mode).

PaginatedPipelineOptions

Configuration for pipelines processing paginated documents.

from docling.datamodel.pipeline_options import PaginatedPipelineOptions

options = PaginatedPipelineOptions(
    images_scale=2.0,
    generate_page_images=True
)

Parameters

Inherits all parameters from ConvertPipelineOptions.

images_scale

float

default:"1.0"

Scaling factor for generated images. Higher values produce higher resolution but increase processing time and storage requirements.

generate_page_images

bool

default:"False"

Generate rendered page images during extraction.

generate_picture_images

bool

default:"False"

Extract and save embedded images from the document.

VlmPipelineOptions

Pipeline configuration for vision-language model based document processing.

from docling.datamodel.pipeline_options import VlmPipelineOptions, VlmConvertOptions

options = VlmPipelineOptions(
    vlm_options=VlmConvertOptions.from_preset("smoldocling")
)

Parameters

Inherits all parameters from PaginatedPipelineOptions.

generate_page_images

bool

default:"True"

Generate page images for VLM processing. Required for vision-language models to analyze document pages. Automatically enabled in VLM pipeline.

force_backend_text

bool

default:"False"

Force use of backend’s native text extraction instead of VLM predictions. When enabled, bypasses VLM text detection and uses embedded text from the document directly.

vlm_options

VlmConvertOptions | InlineVlmOptions | ApiVlmOptions

Vision-Language Model configuration for document understanding. Uses new VlmConvertOptions with preset system (recommended).Default: ‘granite_docling’ presetExample: VlmConvertOptions.from_preset('smoldocling')Legacy InlineVlmOptions/ApiVlmOptions still supported.

AsrPipelineOptions

Configuration options for the Automatic Speech Recognition (ASR) pipeline.

from docling.datamodel.pipeline_options import AsrPipelineOptions
from docling.datamodel import asr_model_specs

options = AsrPipelineOptions(
    asr_options=asr_model_specs.WHISPER_TINY
)

This pipeline processes audio files and converts speech to text using Whisper-based models. Supports various audio formats (MP3, WAV, FLAC, etc.) and video files with audio tracks.

Parameters

Inherits all parameters from PipelineOptions.

asr_options

InlineAsrOptions

default:"asr_model_specs.WHISPER_TINY"

Automatic Speech Recognition (ASR) model configuration for audio transcription. Specifies which ASR model to use (e.g., Whisper variants) and model-specific parameters for speech-to-text conversion.

Notes

Production Best Practices

Enabling multiple features (OCR, table structure, formulas) increases processing time significantly. Enable only necessary features for your use case.
For production systems processing large document volumes, implement timeout protection (90-120 seconds via document_timeout parameter).
OCR requires system installation of engines (Tesseract, EasyOCR). Verify installation before enabling OCR via do_ocr=True.
RapidOCR has known issues with read-only filesystems (e.g., Databricks). Consider Tesseract or alternative backends for distributed systems.

Core API

Pipelines

Options & Configuration

Backends

CLI

PipelineOptions

Overview