Global Options
These options are available for the maindocling convert command.
Input/Output Options
PDF files to convert. Can be local file paths, directory paths, or URLs. Multiple sources can be specified.Examples:
docling convert document.pdfdocling convert /path/to/docs/docling convert https://example.com/file.pdf
Specify input formats to convert from.Type: Multiple values allowed
Default: All formats
Available:
Default: All formats
Available:
pdf, image, docx, pptx, xlsx, html, md, latex, audio, mets_gbsSpecify output formats.Type: Multiple values allowed
Default:
Available:
Default:
markdownAvailable:
json, yaml, html, html_split_page, markdown, text, doctags, vttExample:Output directory where results are saved.Type: Path
Default:
Default:
. (current directory)Example:Specify HTTP request headers used when fetching URL input sources.Type: JSON stringExample:
Image Export Options
Image export mode for documents (applies to JSON, Markdown, and HTML outputs).Type: Enum
Default:
Options:
Default:
embeddedOptions:
placeholder- Only mark image positions in outputembedded- Embed images as base64 encoded stringsreferenced- Export images as PNG files and reference them
If enabled, page images will show bounding boxes of detected items.Type: Boolean
Default:
Default:
falsePipeline Configuration
Choose the processing pipeline for PDF or image files.Type: Enum
Default:
Options:
Default:
standardOptions:
standard- Traditional document processing pipelinevlm- Vision-Language Model based pipeline
Choose the VLM (Vision-Language Model) preset to use with PDF or image files.Type: String
Default:
Available presets:
Default:
granite_doclingAvailable presets:
granite_docling, smol_docling, and othersOnly applicable when --pipeline vlm is set.Choose the ASR (Automatic Speech Recognition) model for audio/video files.Type: Enum
Default:
Available models:
Default:
whisper_tinyAvailable models:
- Auto-select:
whisper_tiny,whisper_base,whisper_small,whisper_medium,whisper_large,whisper_turbo - MLX variants:
whisper_tiny_mlx,whisper_base_mlx,whisper_small_mlx,whisper_medium_mlx,whisper_large_mlx,whisper_turbo_mlx - Native variants:
whisper_tiny_native,whisper_base_native,whisper_small_native,whisper_medium_native,whisper_large_native,whisper_turbo_native
OCR Options
Enable or disable OCR for bitmap content.Type: Boolean
Default:
Default:
trueExample:Replace any existing text with OCR-generated text over the full content.Type: Boolean
Default:
Default:
falseUse this when existing text extraction is poor quality.The OCR engine to use.Type: String
Default:
Available engines:
Default:
autoAvailable engines:
auto, tesseract_cli, tesseract, easyocr, rapidocrAdditional engines may be available with --allow-external-plugins.Example:Comma-separated list of languages for the OCR engine.Type: String
Note: Each OCR engine has different language code formats.Example:
Note: Each OCR engine has different language code formats.Example:
Page Segmentation Mode for the OCR engine.Type: Integer (0-13)
Applies to: Tesseract engines onlySee Tesseract documentation for PSM mode details.
Applies to: Tesseract engines onlySee Tesseract documentation for PSM mode details.
Table Processing Options
Enable or disable table structure extraction.Type: Boolean
Default:
Default:
trueThe mode to use in the table structure model.Type: Enum
Default:
Options:
Default:
accurateOptions:
fast- Faster processing, less accurateaccurate- Slower processing, more accurate
PDF Backend Options
The PDF backend to use for processing.Type: Enum
Default:
Options:
Default:
docling_parseOptions:
docling_parse- Recommended backend (default)pypdfium2- Alternative backend
Password for protected PDF documents.Type: StringExample:
Enrichment Options
Enable code enrichment model in the pipeline.Type: Boolean
Default:
Default:
falseImproves detection and formatting of code blocks.Enable formula enrichment model in the pipeline.Type: Boolean
Default:
Default:
falseImproves detection and rendering of mathematical formulas.Enable picture classification enrichment model.Type: Boolean
Default:
Default:
falseClassifies images into categories (charts, diagrams, photos, etc.).Enable picture description model.Type: Boolean
Default:
Default:
falseGenerates textual descriptions for images.Enable chart extraction to convert bar, pie, and line charts to tabular format.Type: Boolean
Default:
Default:
falseExample:Performance Options
Number of threads to use for processing.Type: Integer
Default:
Default:
4Example:Accelerator device to use for model inference.Type: Enum
Default:
Options:
Default:
autoOptions:
auto- Automatically select best available devicecpu- Use CPU onlycuda- Use NVIDIA GPUmps- Use Apple Metal Performance Shaders (Mac)
Number of pages processed in one batch.Type: Integer
Default: System default (configurable)Larger values may improve throughput but use more memory.
Default: System default (configurable)Larger values may improve throughput but use more memory.
Timeout for processing each document, in seconds.Type: FloatExample:
Model and Plugin Options
Location of model artifacts for offline use.Type: PathExample:
Must be enabled when using models that connect to remote services.Type: Boolean
Default:
Default:
falseMust be enabled for loading modules from third-party plugins.Type: Boolean
Default:
Default:
falseExample:List third-party plugins available when
--allow-external-plugins is set.Type: BooleanDisplays available OCR, layout, and table extraction plugins.Example:Error Handling Options
Control whether processing should abort on first error.Type: Boolean
Default:
Default:
false (continue on errors)When disabled, failed documents are logged but processing continues.Verbosity and Logging Options
Set verbosity level for logging.Type: Count (repeatable)
Levels:
Levels:
- No flag: WARNING level
-v: INFO level-vv: DEBUG level
Debug Visualization Options
Enable debug output which visualizes PDF cells.Type: Boolean
Default:
Default:
falseEnable debug output which visualizes OCR cells.Type: Boolean
Default:
Default:
falseEnable debug output which visualizes layout clusters.Type: Boolean
Default:
Default:
falseEnable debug output which visualizes table cells.Type: Boolean
Default:
Default:
falseExample:Profiling Options
Enable profiling to summarize timing details for all conversion stages.Type: Boolean
Default:
Default:
falseDisplays a detailed timing table after conversion.Save profiling summaries to JSON files.Type: Boolean
Default:
Default:
falseExample:Information Options
Show version information and exit.Output includes:
- Docling version
- Docling Core version
- Docling IBM Models version
- Docling Parse version
- Python version and implementation
- Platform information
Display Docling ASCII art logo and exit.Example:
Model Download Options
These options apply to thedocling tools models download command.
Specific models to download.Type: Multiple values allowed
Available models:
Available models:
layout- Layout analysis modeltableformer- Table structure extraction modelcode_formula- Code and formula detection modelpicture_classifier- Picture classification modelsmolvlm- Small VLM modelgranitedocling- Granite Docling VLMgranitedocling_mlx- Granite Docling for MLXsmoldocling- Small Docling VLMsmoldocling_mlx- Small Docling for MLXgranite_vision- Granite Vision modelgranite_chart_extraction- Chart extraction modelrapidocr- RapidOCR modeleasyocr- EasyOCR model
layout,tableformer,code_formula,picture_classifier,rapidocr
Directory where models will be downloaded.Type: Path
Default: System cache directoryExample:
Default: System cache directoryExample:
Force download even if models already exist.Type: Boolean
Default:
Default:
falseDownload all available models.Type: Boolean
Default:
Note: Mutually exclusive with specifying individual modelsExample:
Default:
falseNote: Mutually exclusive with specifying individual modelsExample:
Minimal output mode.Type: Boolean
Default:
Default:
falseWhen enabled, only prints the output directory path. Useful for scripts.Example:HuggingFace Download Options
These options apply to thedocling tools models download-hf-repo command.
HuggingFace repository IDs to download.Type: Multiple values allowed
Format:
Format:
org-name/repo-nameExample:Directory where models will be downloaded.Type: Path
Default: System cache directory
Default: System cache directory
Force download even if model already exists.Type: Boolean
Default:
Default:
falseMinimal output mode.Type: Boolean
Default:
Default:
falseWhen enabled, only prints the output directory path.