Use Vision-Language Models (VLMs) to convert PDFs with enhanced understanding of layout, text, and visual elements.
Overview
The VLM pipeline uses state-of-the-art vision-language models to process documents. This example shows:
- Default VLM configuration (simplest approach)
- Using model presets
- Runtime overrides for different hardware (MLX, Transformers)
- Automatic hardware selection
Simple VLM Conversion
from docling.datamodel.base_models import InputFormat
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.pipeline.vlm_pipeline import VlmPipeline
# Convert a public arXiv PDF
source = "https://arxiv.org/pdf/2501.17887"
# Default configuration - uses GraniteDocling model
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_cls=VlmPipeline,
),
}
)
doc = converter.convert(source=source).document
print(doc.export_to_markdown())
The default VLM configuration uses the GraniteDocling model and automatically selects the best runtime for your platform.
Using Presets
Import VLM Options
Import VlmConvertOptions and VlmPipelineOptions.
Create Preset Configuration
Use VlmConvertOptions.from_preset() with a preset name.
Configure Pipeline
Pass VLM options to the pipeline configuration.
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import (
VlmConvertOptions,
VlmPipelineOptions,
)
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.pipeline.vlm_pipeline import VlmPipeline
source = "https://arxiv.org/pdf/2501.17887"
# Use the granite_docling preset
vlm_options = VlmConvertOptions.from_preset("granite_docling")
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_cls=VlmPipeline,
pipeline_options=VlmPipelineOptions(vlm_options=vlm_options),
),
}
)
doc = converter.convert(source=source).document
print(doc.export_to_markdown())
Runtime Overrides
# Automatically selects best runtime for your platform
vlm_options = VlmConvertOptions.from_preset("granite_docling")
Available Presets
Docling provides several VLM model presets:
- granite_docling: Default GraniteDocling model (recommended)
- Additional presets may be available - check the documentation
Complete Example with MLX
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import (
VlmConvertOptions,
VlmPipelineOptions,
)
from docling.datamodel.vlm_engine_options import (
MlxVlmEngineOptions,
VlmEngineType,
)
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.pipeline.vlm_pipeline import VlmPipeline
source = "https://arxiv.org/pdf/2501.17887"
# Configure MLX runtime for Apple Silicon
vlm_options = VlmConvertOptions.from_preset(
"granite_docling",
engine_options=MlxVlmEngineOptions(),
)
print(f"Using model: {vlm_options.model_spec.get_repo_id(VlmEngineType.MLX)}")
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_cls=VlmPipeline,
pipeline_options=VlmPipelineOptions(vlm_options=vlm_options),
),
}
)
doc = converter.convert(source=source).document
print(doc.export_to_markdown())
Requirements
- Python 3.9+
docling with VLM extras: pip install docling[vlm]
- For Transformers backend:
pip install transformers
- For MLX backend (Apple Silicon):
pip install mlx mlx-whisper
- Network access to download model weights from Hugging Face
Hardware Recommendations
- Apple Silicon (M1/M2/M3): Use MLX runtime for optimized performance
- NVIDIA GPUs: Use Transformers runtime with CUDA
- CPU only: Use Transformers runtime (slower but works)
VLM models require downloading weights on first use. Ensure you have network access and sufficient disk space.