Overview
DocumentConverter is the main entry point for converting documents in Docling. It handles various input formats (PDF, DOCX, PPTX, images, HTML, Markdown, etc.) and provides both single-document and batch conversion capabilities.
The conversion methods return a ConversionResult instance for each document, which wraps a DoclingDocument object if the conversion was successful, along with metadata about the conversion process.
Class Definition
Constructor
__init__()
Initialize the converter based on format preferences.
List of allowed input formats. By default, any format supported by Docling is allowed.
Dictionary of format-specific options. Each format can have custom pipeline and backend configurations.
Attributes
Allowed input formats for conversion.
Mapping of formats to their configuration options.
Cache of initialized pipelines keyed by (pipeline class, options hash).
Methods
convert()
Convert one document fetched from a file path, URL, or DocumentStream.
Source of input document given as file path, URL, or DocumentStream.
Optional headers given as a dictionary of string key-value pairs, in case of URL input source.
Whether to raise an error on the first conversion failure. If False, errors are captured in the ConversionResult objects.
Maximum number of pages accepted per document. Documents exceeding this number will not be converted.
Maximum file size to convert (in bytes).
Range of pages to convert.
The conversion result, which contains a
DoclingDocument in the document attribute, and metadata about the conversion process.ConversionError: An error occurred during conversion.
convert_all()
Convert multiple documents from file paths, URLs, or DocumentStreams.
Source of input documents given as an iterable of file paths, URLs, or DocumentStreams.
Optional headers given as a (single) dictionary of string key-value pairs, in case of URL input source.
Whether to raise an error on the first conversion failure.
Maximum number of pages to convert.
Maximum number of pages accepted per document. Documents exceeding this number will be skipped.
Range of pages to convert in each document.
The conversion results, each containing a
DoclingDocument in the document attribute and metadata about the conversion process.ConversionError: An error occurred during conversion.
convert_string()
Convert a document given as a string using the specified format.
Only Markdown (InputFormat.MD) and HTML (InputFormat.HTML) formats are supported. The content is wrapped in a DocumentStream and passed to the main conversion pipeline.
The document content as a string.
The format of the input content. Must be either
InputFormat.MD or InputFormat.HTML.The filename to associate with the document. If not provided, a timestamp-based name is generated. The appropriate file extension (
md or html) is appended if missing.The conversion result, which contains a
DoclingDocument in the document attribute, and metadata about the conversion process.ValueError: If format is neitherInputFormat.MDnorInputFormat.HTML.ConversionError: An error occurred during conversion.
initialize_pipeline()
Initialize the conversion pipeline for the selected format.
The input format for which to initialize the pipeline.
ConversionError: If no pipeline could be initialized for the given format.RuntimeError: Ifartifacts_pathis set indocling.datamodel.settings.settingswhen required by the pipeline, but points to a non-directory file.FileNotFoundError: If local model files are not found.
Complete Example
See Also
- ConversionResult - Result object returned by conversion methods
- DoclingDocument - The document object containing converted content