Basic Document Conversion

Overview

Docling makes document conversion straightforward with a simple, unified API. Whether you’re converting a single file or processing hundreds of documents, the workflow is the same:

Create a DocumentConverter instance
Call convert() or convert_all() with your source(s)
Access the resulting DoclingDocument from the ConversionResult

Single Document Conversion

The simplest way to convert a document:

from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
result = converter.convert(source)
doc = result.document

print(doc.export_to_markdown())

Docling automatically detects the document format based on file extension or content type. Supported formats include PDF, DOCX, PPTX, XLSX, HTML, Markdown, images, and more.

Understanding ConversionResult

The convert() method returns a ConversionResult object containing:

result = converter.convert(source)

# The converted document
doc = result.document

# Conversion status (SUCCESS, FAILURE, PARTIAL_SUCCESS)
print(result.status)

# Input document metadata
print(result.input.file.name)
print(result.input.format)

# Any errors encountered during conversion
if result.errors:
    for error in result.errors:
        print(f"Error: {error.error_message}")

Conversion Status

SUCCESS

Document converted completely without errors. All content was extracted successfully.

PARTIAL_SUCCESS

Document converted with some non-critical errors. Most content was extracted, but some elements may be missing or incomplete.

FAILURE

Document conversion failed. The errors list contains details about what went wrong.

SKIPPED

Document was not processed (e.g., format not allowed, file size exceeded limits).

Error Handling

By default, Docling raises exceptions on conversion errors. You can control this behavior:

from docling.document_converter import DocumentConverter

converter = DocumentConverter()

try:
    result = converter.convert(source, raises_on_error=True)
    doc = result.document
except Exception as e:
    print(f"Conversion failed: {e}")

Processing Multiple Documents

Use convert_all() to process multiple documents efficiently:

from pathlib import Path
from docling.document_converter import DocumentConverter
from docling.datamodel.base_models import ConversionStatus

sources = [
    Path("document1.pdf"),
    Path("document2.docx"),
    "https://example.com/doc3.pdf",
]

converter = DocumentConverter()

# Returns an iterator of ConversionResult objects
for result in converter.convert_all(sources, raises_on_error=False):
    if result.status == ConversionStatus.SUCCESS:
        print(f"Converted: {result.input.file.name}")
        # Process the document
        doc = result.document
    else:
        print(f"Failed: {result.input.file.name}")

convert_all() returns an iterator, not a list. Results are yielded as documents are converted, allowing you to process large batches without loading everything into memory.

Document Limits

Control which documents get processed by setting limits:

from docling.document_converter import DocumentConverter

converter = DocumentConverter()

result = converter.convert(
    source,
    max_num_pages=100,           # Skip documents with more than 100 pages
    max_file_size=20971520,      # Skip files larger than 20MB (in bytes)
)

Page Range Selection

Convert only specific pages from a document:

from docling.document_converter import DocumentConverter
from docling.datamodel.settings import PageRange

converter = DocumentConverter()

# Convert pages 1-10
result = converter.convert(
    source,
    page_range=PageRange(start=0, end=10)  # 0-indexed
)

# Convert from page 5 to end
result = converter.convert(
    source,
    page_range=PageRange(start=4)  # end=None means to the end
)

Quick Export

Once you have a DoclingDocument, export to various formats:

# Export to Markdown string
markdown = doc.export_to_markdown()
print(markdown)

# Save to file
doc.save_as_markdown("output.md")

For detailed information on export formats and options, see the Export Formats guide.

Resource Control

Limit CPU usage for document processing:

# Set number of CPU threads (default: 4)
export OMP_NUM_THREADS=2
python your_script.py

Or in your Python code:

import os

# Must be set before importing docling
os.environ["OMP_NUM_THREADS"] = "2"

from docling.document_converter import DocumentConverter

Next Steps

Batch Processing

Learn efficient batch processing techniques for large document collections

Advanced Options

Customize conversion behavior with pipeline options and format-specific settings

Export Formats

Explore all available export formats and their options

PDF Processing

Deep dive into PDF-specific features and options

Get Started

Core Concepts

Usage Guides

Advanced Features

Integrations

Basic Document Conversion

Overview

Single Document Conversion

Understanding ConversionResult

Conversion Status

Error Handling

Processing Multiple Documents

Document Limits

Page Range Selection

Quick Export

Resource Control

Next Steps

Batch Processing

Advanced Options

Export Formats

PDF Processing

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage Guides

Advanced Features

Integrations

​Overview

​Single Document Conversion

​Understanding ConversionResult

​Conversion Status

​Error Handling

​Processing Multiple Documents

​Document Limits

​Page Range Selection

​Quick Export

​Resource Control

​Next Steps

Batch Processing

Advanced Options

Export Formats

PDF Processing

Build docs developers (and LLMs) love

Overview

Single Document Conversion

Understanding ConversionResult

Conversion Status

Error Handling

Processing Multiple Documents

Document Limits

Page Range Selection

Quick Export

Resource Control

Next Steps