Skip to main content
Docling provides a powerful command-line interface for document conversion and model management.

Main Commands

docling convert

Convert documents from various formats to different output formats.
docling convert [OPTIONS] SOURCE...

Arguments

ArgumentDescription
SOURCEPDF files to convert. Can be local file paths, directory paths, or URLs. Multiple sources can be specified.

Basic Examples

Convert a single PDF to Markdown:
docling convert document.pdf
Convert multiple documents:
docling convert doc1.pdf doc2.docx doc3.pptx
Convert all PDFs in a directory:
docling convert /path/to/documents/
Convert from URL:
docling convert https://example.com/document.pdf
Convert to specific output formats:
docling convert document.pdf --to json --to html --to markdown

Output Format Options

OptionDescription
--fromSpecify input formats to convert from. Defaults to all formats.
--toSpecify output formats. Available: json, yaml, html, html_split_page, markdown, text, doctags, vtt. Defaults to markdown.
--outputOutput directory where results are saved. Default: current directory (.)
--image-export-modeImage export mode. Options: placeholder, embedded, referenced. Default: embedded
--show-layoutIf enabled, page images will show bounding boxes of items.

Pipeline Options

OptionDescription
--pipelineChoose the processing pipeline. Options: standard, vlm. Default: standard
--vlm-modelChoose the VLM preset to use (when using VLM pipeline). Default: granite_docling. Available presets include: granite_docling, smol_docling, etc.
--asr-modelChoose the ASR model for audio/video files. Options include: whisper_tiny, whisper_base, whisper_small, whisper_medium, whisper_large, whisper_turbo, and MLX/native variants. Default: whisper_tiny

Processing Options

OptionDescription
--ocr / --no-ocrEnable/disable OCR for bitmap content. Default: enabled
--force-ocrReplace any existing text with OCR generated text over the full content.
--ocr-engineThe OCR engine to use. Default: auto (available: auto, tesseract_cli, tesseract, easyocr, rapidocr)
--ocr-langComma-separated list of languages for OCR engine.
--psmPage Segmentation Mode for OCR engine (0-13).
--tables / --no-tablesEnable/disable table structure extraction. Default: enabled
--table-modeTable structure model mode. Options: fast, accurate. Default: accurate

Enrichment Options

OptionDescription
--enrich-codeEnable code enrichment model in the pipeline.
--enrich-formulaEnable formula enrichment model in the pipeline.
--enrich-picture-classesEnable picture classification enrichment model.
--enrich-picture-descriptionEnable picture description model.
--enrich-chart-extractionEnable chart extraction to convert bar, pie, and line charts to tabular format.

PDF Backend Options

OptionDescription
--pdf-backendThe PDF backend to use. Options: docling_parse, pypdfium2. Default: docling_parse
--pdf-passwordPassword for protected PDF documents.

Performance Options

OptionDescription
--num-threadsNumber of threads. Default: 4
--deviceAccelerator device. Options: auto, cpu, cuda, mps. Default: auto
--page-batch-sizeNumber of pages processed in one batch.
--document-timeoutTimeout for processing each document, in seconds.

Model and Plugin Options

OptionDescription
--artifacts-pathLocation of the model artifacts (for offline use).
--enable-remote-servicesMust be enabled when using models connecting to remote services.
--allow-external-pluginsMust be enabled for loading modules from third-party plugins.
--show-external-pluginsList third-party plugins available with --allow-external-plugins.

Debug Options

OptionDescription
--debug-visualize-cellsEnable debug output which visualizes PDF cells.
--debug-visualize-ocrEnable debug output which visualizes OCR cells.
--debug-visualize-layoutEnable debug output which visualizes layout clusters.
--debug-visualize-tablesEnable debug output which visualizes table cells.

Profiling Options

OptionDescription
--profilingSummarize profiling details for all conversion stages.
--save-profilingSave profiling summaries to JSON.

Other Options

OptionDescription
--headersSpecify HTTP request headers for URL sources (JSON string).
--abort-on-error / --no-abort-on-errorIf enabled, processing aborts on first error. Default: disabled
-v, --verboseSet verbosity level. Use -v for info logging, -vv for debug logging.
--versionShow version information.
--logoDisplay Docling ASCII art logo.

Advanced Examples

Convert with OCR and table extraction:
docling convert document.pdf --ocr --tables --ocr-engine easyocr
Convert to multiple formats with custom output:
docling convert document.pdf --to json --to markdown --to html --output ./output
Use VLM pipeline with specific model:
docling convert document.pdf --pipeline vlm --vlm-model granite_docling
Convert with enrichment features:
docling convert document.pdf \
  --enrich-formula \
  --enrich-code \
  --enrich-picture-description \
  --enrich-chart-extraction
Convert password-protected PDF:
docling convert protected.pdf --pdf-password "mypassword"
Convert directory with profiling:
docling convert ./documents --profiling --save-profiling --output ./results
Convert with custom HTTP headers:
docling convert https://example.com/doc.pdf \
  --headers '{"Authorization": "Bearer token123", "User-Agent": "Docling"}'

Tools Commands

Docling provides helper commands for managing models and other utilities.

docling tools models download

Download Docling models for offline use.
docling tools models download [OPTIONS] [MODELS]...

Arguments

ArgumentDescription
MODELSSpecific models to download. Available options: layout, tableformer, code_formula, picture_classifier, smolvlm, granitedocling, granitedocling_mlx, smoldocling, smoldocling_mlx, granite_vision, granite_chart_extraction, rapidocr, easyocr.

Options

OptionDescription
-o, --output-dirDirectory where models will be downloaded. Default: system cache directory
--forceForce download even if models already exist.
--allDownload all available models (mutually exclusive with specifying models).
-q, --quietMinimal output, prints only the output directory.

Examples

Download default models:
docling tools models download
This downloads the default set: layout, tableformer, code_formula, picture_classifier, and rapidocr. Download specific models:
docling tools models download layout tableformer easyocr
Download all available models:
docling tools models download --all
Download to custom directory:
docling tools models download --output-dir /path/to/models
Force re-download:
docling tools models download layout --force
Quiet mode (useful for scripts):
MODEL_DIR=$(docling tools models download --quiet)
echo "Models are in: $MODEL_DIR"
Use downloaded models:
# First download models
docling tools models download --output-dir ./my-models

# Then use them with convert
docling convert document.pdf --artifacts-path ./my-models

docling tools models download-hf-repo

Download specific models from HuggingFace by repository ID.
docling tools models download-hf-repo [OPTIONS] MODELS...

Arguments

ArgumentDescription
MODELSHuggingFace repository IDs to download (e.g., docling-project/docling-models).

Options

OptionDescription
-o, --output-dirDirectory where models will be downloaded. Default: system cache directory
--forceForce download even if model already exists.
-q, --quietMinimal output, prints only the output directory.

Examples

Download a HuggingFace model:
docling tools models download-hf-repo docling-project/docling-models
Download multiple HuggingFace models:
docling tools models download-hf-repo \
  docling-project/docling-models \
  some-org/custom-model
Download to custom directory:
docling tools models download-hf-repo docling-project/docling-models \
  --output-dir /path/to/models
Force re-download:
docling tools models download-hf-repo docling-project/docling-models --force

Build docs developers (and LLMs) love