Docling provides enrichment features that enhance converted documents with additional metadata and structured information. Enrichments process specific document components using specialized models:
Code Understanding: Parse and extract code blocks with language detection
Formula Understanding: Extract LaTeX representations from mathematical formulas
Picture Classification: Classify images into semantic categories
Picture Description: Generate natural language descriptions for images
Enrichment steps require additional model executions and significantly increase processing time. Most enrichments are disabled by default.
from docling.datamodel.pipeline_options import ( PdfPipelineOptions, PictureDescriptionVlmOptions)pipeline_options = PdfPipelineOptions()pipeline_options.picture_description_options = PictureDescriptionVlmOptions( repo_id="your-org/your-vlm-model", prompt="Describe the image in three sentences. Be concise and accurate.",)pipeline_options.do_picture_description = True
Connect to remote inference servers or cloud providers:
from docling.datamodel.pipeline_options import ( PdfPipelineOptions, PictureDescriptionApiOptions)pipeline_options = PdfPipelineOptions()# Required for remote connectionspipeline_options.enable_remote_services = Truepipeline_options.picture_description_options = PictureDescriptionApiOptions( url="http://localhost:8000/v1/chat/completions", # vLLM, Ollama, etc. params=dict( model="your-model-name", seed=42, max_completion_tokens=200, ), prompt="Describe the image in three sentences. Be concise and accurate.", timeout=90,)pipeline_options.do_picture_description = True
Code/Formula: Relatively fast (transformers inference)
Picture Classification: Fast (small model ~100MB)
Picture Description: Slower (VLM inference per image)
For documents with many images, picture description can significantly increase processing time.
Memory Usage
Each enrichment model loads into memory
VLMs can use 500MB - 8GB+ depending on model size
Consider GPU acceleration for faster VLM processing
Use smaller models (SmolVLM) for resource-constrained environments
Batch Processing
Enrichments process elements in batches:
from docling.datamodel.settings import settings# Adjust batch size for enrichmentssettings.perf.elements_batch_size = 32 # Default varies by model
Selective Enrichment
Filter which elements to enrich:
# Only enrich specific picture typesfrom docling_core.types.doc import PictureItem# After conversion, manually filterfor item, level in doc.iterate_items(): if isinstance(item, PictureItem): if item.classification in ["Chart", "Diagram"]: # Enrich only charts and diagrams pass
from docling.models.base_model import BaseEnrichmentModelfrom docling_core.types.doc import DoclingDocument, NodeItem, TextItemfrom typing import Iterable, Optionalfrom docling.datamodel.document import ConversionResultclass CustomEnrichmentModel(BaseEnrichmentModel): def __init__(self, **kwargs): super().__init__() # Initialize your model here self.model = load_your_model() def is_processable(self, doc: DoclingDocument, element: NodeItem) -> bool: """Determine if this element should be enriched.""" # Example: only process TextItem elements return isinstance(element, TextItem) def __call__( self, doc: DoclingDocument, element_batch: Iterable[NodeItem] ) -> Iterable[NodeItem]: """Process batch of elements.""" batch_list = list(element_batch) # Your enrichment logic here for element in batch_list: # Enrich the element element.enriched_data = self.model.process(element.text) return batch_list