Docling makes document conversion straightforward with a simple, unified API. Whether you’re converting a single file or processing hundreds of documents, the workflow is the same:
Create a DocumentConverter instance
Call convert() or convert_all() with your source(s)
Access the resulting DoclingDocument from the ConversionResult
Docling automatically detects the document format based on file extension or content type. Supported formats include PDF, DOCX, PPTX, XLSX, HTML, Markdown, images, and more.
The convert() method returns a ConversionResult object containing:
result = converter.convert(source)# The converted documentdoc = result.document# Conversion status (SUCCESS, FAILURE, PARTIAL_SUCCESS)print(result.status)# Input document metadataprint(result.input.file.name)print(result.input.format)# Any errors encountered during conversionif result.errors: for error in result.errors: print(f"Error: {error.error_message}")
Use convert_all() to process multiple documents efficiently:
from pathlib import Pathfrom docling.document_converter import DocumentConverterfrom docling.datamodel.base_models import ConversionStatussources = [ Path("document1.pdf"), Path("document2.docx"), "https://example.com/doc3.pdf",]converter = DocumentConverter()# Returns an iterator of ConversionResult objectsfor result in converter.convert_all(sources, raises_on_error=False): if result.status == ConversionStatus.SUCCESS: print(f"Converted: {result.input.file.name}") # Process the document doc = result.document else: print(f"Failed: {result.input.file.name}")
convert_all() returns an iterator, not a list. Results are yielded as documents are converted, allowing you to process large batches without loading everything into memory.
Control which documents get processed by setting limits:
from docling.document_converter import DocumentConverterconverter = DocumentConverter()result = converter.convert( source, max_num_pages=100, # Skip documents with more than 100 pages max_file_size=20971520, # Skip files larger than 20MB (in bytes))