Skip to main content

Overview

Docling provides enrichment features that enhance converted documents with additional metadata and structured information. Enrichments process specific document components using specialized models:
  • Code Understanding: Parse and extract code blocks with language detection
  • Formula Understanding: Extract LaTeX representations from mathematical formulas
  • Picture Classification: Classify images into semantic categories
  • Picture Description: Generate natural language descriptions for images
Enrichment steps require additional model executions and significantly increase processing time. Most enrichments are disabled by default.

Available Enrichments

FeatureParameterProcessed ItemModel
Code Understandingdo_code_enrichmentCodeItemCodeFormulaV2
Formula Understandingdo_formula_enrichmentTextItem (FORMULA label)CodeFormulaV2
Picture Classificationdo_picture_classificationPictureItemDocumentFigureClassifier
Picture Descriptiondo_picture_descriptionPictureItemSmolVLM / Granite Vision
Source: ~/workspace/source/docs/usage/enrichments.md:7

Code Understanding

Overview

The code understanding enrichment:
  • Parses code blocks found in documents
  • Detects programming language automatically
  • Sets the code_language property on CodeItem elements
  • Enables syntax-aware processing
Model: CodeFormula

Usage

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.do_code_enrichment = True

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options
        )
    }
)

result = converter.convert("document.pdf")
doc = result.document

# Access enriched code items
for item, level in doc.iterate_items():
    if isinstance(item, CodeItem):
        print(f"Language: {item.code_language}")
        print(f"Code: {item.text}")
Source: ~/workspace/source/docs/usage/enrichments.md:18

Example Output

# Detected code block
code_item.code_language  # "python"
code_item.text           # "def hello():\n    print('Hello')\n"

Formula Understanding

Overview

The formula understanding enrichment:
  • Analyzes mathematical equations in documents
  • Extracts LaTeX representation
  • Enables MathML rendering in HTML exports
  • Preserves mathematical semantics
Model: CodeFormula

Usage

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.do_formula_enrichment = True

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options
        )
    }
)

result = converter.convert("math_paper.pdf")
doc = result.document

# Export with MathML rendering
html = doc.export_to_html()
Source: ~/workspace/source/docs/usage/enrichments.md:48

Example Output

# Formula with LaTeX
text_item.label          # "FORMULA"
text_item.text           # Original formula text
text_item.enriched_data  # {"latex": "E = mc^2"}

Picture Classification

Overview

The picture classification enrichment:
  • Classifies images into semantic categories
  • Specialized for document imagery
  • Detects charts, diagrams, logos, signatures, etc.
  • Adds classification metadata to PictureItem elements
Model: DocumentFigureClassifier v2.0

Supported Classes

  • Chart types (bar, line, pie, scatter, etc.)
  • Diagrams and flowcharts
  • Natural images and photographs
  • Logos and branding elements
  • Signatures
  • Technical illustrations

Usage

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.generate_picture_images = True  # Required
pipeline_options.images_scale = 2  # Higher quality for classification
pipeline_options.do_picture_classification = True

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options
        )
    }
)

result = converter.convert("document.pdf")
doc = result.document

# Access classified pictures
for item, level in doc.iterate_items():
    if isinstance(item, PictureItem):
        print(f"Classification: {item.classification}")
        print(f"Confidence: {item.classification_confidence}")
Source: ~/workspace/source/docs/usage/enrichments.md:80
Important: You must enable generate_picture_images=True for picture classification to work.

Picture Description

Overview

The picture description enrichment:
  • Generates natural language descriptions (captions)
  • Uses vision-language models (VLMs)
  • Supports both local and remote models
  • Customizable prompts for specific use cases

Default Configuration

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.do_picture_description = True  # Uses default model

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options
        )
    }
)

result = converter.convert("document.pdf")
doc = result.document

# Access descriptions
for item, level in doc.iterate_items():
    if isinstance(item, PictureItem) and hasattr(item, 'description'):
        print(f"Description: {item.description}")
Source: ~/workspace/source/docs/usage/enrichments.md:114

Model Options

SmolVLM (Default)

Lightweight 256M parameter model, good for general descriptions. Model: HuggingFaceTB/SmolVLM-256M-Instruct
from docling.datamodel.pipeline_options import (
    PdfPipelineOptions,
    smolvlm_picture_description
)

pipeline_options = PdfPipelineOptions()
pipeline_options.picture_description_options = smolvlm_picture_description
pipeline_options.do_picture_description = True
Source: ~/workspace/source/docs/usage/enrichments.md:150

Granite Vision

Higher quality 2B parameter model for detailed descriptions. Model: ibm-granite/granite-vision-3.1-2b-preview
from docling.datamodel.pipeline_options import (
    PdfPipelineOptions,
    granite_picture_description
)

pipeline_options = PdfPipelineOptions()
pipeline_options.picture_description_options = granite_picture_description
pipeline_options.do_picture_description = True
Source: ~/workspace/source/docs/usage/enrichments.md:138

Custom VLM Models

Use any vision-language model from Hugging Face:
from docling.datamodel.pipeline_options import (
    PdfPipelineOptions,
    PictureDescriptionVlmOptions
)

pipeline_options = PdfPipelineOptions()
pipeline_options.picture_description_options = PictureDescriptionVlmOptions(
    repo_id="your-org/your-vlm-model",
    prompt="Describe the image in three sentences. Be concise and accurate.",
)
pipeline_options.do_picture_description = True
Source: ~/workspace/source/docs/usage/enrichments.md:164

Remote API Models

Connect to remote inference servers or cloud providers:
from docling.datamodel.pipeline_options import (
    PdfPipelineOptions,
    PictureDescriptionApiOptions
)

pipeline_options = PdfPipelineOptions()

# Required for remote connections
pipeline_options.enable_remote_services = True

pipeline_options.picture_description_options = PictureDescriptionApiOptions(
    url="http://localhost:8000/v1/chat/completions",  # vLLM, Ollama, etc.
    params=dict(
        model="your-model-name",
        seed=42,
        max_completion_tokens=200,
    ),
    prompt="Describe the image in three sentences. Be concise and accurate.",
    timeout=90,
)
pipeline_options.do_picture_description = True
Source: ~/workspace/source/docs/usage/enrichments.md:176
Remote API options may send your document data to external services. Ensure compliance with your data privacy requirements.

Enrichment Architecture

Base Classes

Enrichment models inherit from base classes that define the enrichment interface: Source: ~/workspace/source/docling/models/base_model.py:150
class GenericEnrichmentModel(ABC, Generic[EnrichElementT]):
    elements_batch_size: int
    
    @abstractmethod
    def is_processable(self, doc: DoclingDocument, element: NodeItem) -> bool:
        """Determine if element should be processed."""
        pass
    
    @abstractmethod
    def prepare_element(
        self, conv_res: ConversionResult, element: NodeItem
    ) -> Optional[EnrichElementT]:
        """Prepare element for batch processing."""
        pass
    
    @abstractmethod
    def __call__(
        self, doc: DoclingDocument, element_batch: Iterable[EnrichElementT]
    ) -> Iterable[NodeItem]:
        """Process batch of elements and return enriched items."""
        pass

Enrichment Types

Node-Based Enrichment

For enrichments that only need document structure: Source: ~/workspace/source/docling/models/base_model.py:170
class BaseEnrichmentModel(GenericEnrichmentModel[NodeItem]):
    """Enrichment that processes document nodes."""
    
    def prepare_element(
        self, conv_res: ConversionResult, element: NodeItem
    ) -> Optional[NodeItem]:
        if self.is_processable(doc=conv_res.document, element=element):
            return element
        return None

Image-Based Enrichment

For enrichments that need image data: Source: ~/workspace/source/docling/models/base_model.py:179
class BaseItemAndImageEnrichmentModel(
    GenericEnrichmentModel[ItemAndImageEnrichmentElement]
):
    """Enrichment that processes items with their images."""
    
    images_scale: float
    expansion_factor: float = 0.0  # Expand bounding box
    
    def prepare_element(
        self, conv_res: ConversionResult, element: NodeItem
    ) -> Optional[ItemAndImageEnrichmentElement]:
        # Crops image from page using element bounding box
        # Returns ItemAndImageEnrichmentElement(item=element, image=cropped_image)
        ...

Combining Enrichments

Enable multiple enrichments together:
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()

# Enable multiple enrichments
pipeline_options.do_code_enrichment = True
pipeline_options.do_formula_enrichment = True
pipeline_options.generate_picture_images = True
pipeline_options.do_picture_classification = True
pipeline_options.do_picture_description = True

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options
        )
    }
)

result = converter.convert("document.pdf")

Performance Considerations

Each enrichment adds processing time:
  • Code/Formula: Relatively fast (transformers inference)
  • Picture Classification: Fast (small model ~100MB)
  • Picture Description: Slower (VLM inference per image)
For documents with many images, picture description can significantly increase processing time.
  • Each enrichment model loads into memory
  • VLMs can use 500MB - 8GB+ depending on model size
  • Consider GPU acceleration for faster VLM processing
  • Use smaller models (SmolVLM) for resource-constrained environments
Enrichments process elements in batches:
from docling.datamodel.settings import settings

# Adjust batch size for enrichments
settings.perf.elements_batch_size = 32  # Default varies by model
Filter which elements to enrich:
# Only enrich specific picture types
from docling_core.types.doc import PictureItem

# After conversion, manually filter
for item, level in doc.iterate_items():
    if isinstance(item, PictureItem):
        if item.classification in ["Chart", "Diagram"]:
            # Enrich only charts and diagrams
            pass

Developing Custom Enrichments

Create your own enrichment models by implementing the base classes: Example References:

Basic Structure

from docling.models.base_model import BaseEnrichmentModel
from docling_core.types.doc import DoclingDocument, NodeItem, TextItem
from typing import Iterable, Optional
from docling.datamodel.document import ConversionResult

class CustomEnrichmentModel(BaseEnrichmentModel):
    def __init__(self, **kwargs):
        super().__init__()
        # Initialize your model here
        self.model = load_your_model()
    
    def is_processable(self, doc: DoclingDocument, element: NodeItem) -> bool:
        """Determine if this element should be enriched."""
        # Example: only process TextItem elements
        return isinstance(element, TextItem)
    
    def __call__(
        self, doc: DoclingDocument, element_batch: Iterable[NodeItem]
    ) -> Iterable[NodeItem]:
        """Process batch of elements."""
        batch_list = list(element_batch)
        
        # Your enrichment logic here
        for element in batch_list:
            # Enrich the element
            element.enriched_data = self.model.process(element.text)
        
        return batch_list

Integration

Register your enrichment with the pipeline:
# Add to pipeline_options or integrate via plugin system
pipeline_options.custom_enrichment_model = CustomEnrichmentModel()
See the Plugin System guide for more advanced integration.

Plugin System

Extend Docling with custom plugins

Model Catalog

Available models for enrichments

Pipeline Options

Configure enrichment pipelines

GPU Acceleration

Speed up enrichments with GPU

Build docs developers (and LLMs) love