Skip to main content
This guide walks you through your first document conversion using Docling. You’ll learn how to convert a PDF to Markdown with just a few lines of Python code.

Prerequisites

Before you begin, ensure you have:
  • Python 3.10 or higher installed
  • Docling installed (see Installation)
pip install docling

Your first conversion

Let’s start with the simplest possible example - converting a document and exporting it to Markdown.
1

Create a Python script

Create a new file called convert_document.py:
from docling.document_converter import DocumentConverter

# Specify your document source (URL or local path)
source = "https://arxiv.org/pdf/2408.09869"

# Create a converter and process the document
converter = DocumentConverter()
result = converter.convert(source)

# Export to Markdown and print
print(result.document.export_to_markdown())
2

Run the script

Execute your script:
python convert_document.py
You should see the document content in Markdown format printed to your console.
3

Understand the output

The result object contains:
  • result.document - The structured DoclingDocument with all content and metadata
  • result.status - Conversion status (SUCCESS, PARTIAL_SUCCESS, or FAILURE)
  • result.input - Information about the source document
Supported sources: You can use URLs, local file paths, or file-like objects as input. Docling auto-detects the format.

Converting local files

To convert a local file instead of a URL:
from pathlib import Path
from docling.document_converter import DocumentConverter

# Use a local file path
source = Path("/path/to/your/document.pdf")

converter = DocumentConverter()
result = converter.convert(source)

print(result.document.export_to_markdown())

Export formats

Docling supports multiple export formats. Here’s how to use them:
# Export to Markdown
markdown_content = result.document.export_to_markdown()
print(markdown_content)

# Save to file
result.document.save_as_markdown("output.md")

Working with document content

The DoclingDocument provides structured access to all document elements:
from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("document.pdf")
doc = result.document

# Access document metadata
print(f"Document name: {doc.name}")
print(f"Number of pages: {len(doc.pages)}")

# Iterate through document elements
for element, level in doc.iterate_items():
    print(f"{'  ' * level}{element.label}: {element.text[:50]}...")

# Access specific element types
print(f"\nFound {len(doc.tables)} tables")
for i, table in enumerate(doc.tables):
    print(f"Table {i + 1}:")
    # Export table to pandas DataFrame
    df = table.export_to_dataframe(doc)
    print(df.head())

print(f"\nFound {len(doc.pictures)} figures/images")
for i, picture in enumerate(doc.pictures):
    print(f"Figure {i + 1}: {picture.caption or 'No caption'}")

Batch processing

Convert multiple documents efficiently:
from pathlib import Path
from docling.document_converter import DocumentConverter

# List of documents to convert
documents = [
    "document1.pdf",
    "document2.docx",
    "https://example.com/document3.pdf"
]

converter = DocumentConverter()

# Convert all documents
for conv_result in converter.convert_all(documents):
    doc_filename = conv_result.input.file.stem
    
    if conv_result.status == "success":
        # Save successful conversions
        output_path = Path(f"output/{doc_filename}.md")
        output_path.parent.mkdir(exist_ok=True)
        conv_result.document.save_as_markdown(output_path)
        print(f"✓ Converted {doc_filename}")
    else:
        print(f"✗ Failed to convert {doc_filename}")

Using the CLI

Docling includes a command-line interface for quick conversions without writing code:
Convert a document and print Markdown output:
docling https://arxiv.org/pdf/2408.09869
For more CLI options, see the CLI reference.

What’s next?

Now that you’ve completed your first conversion, explore these topics:

Basic conversion guide

Learn about format support, input types, and conversion options

Advanced options

Configure pipelines, OCR engines, and processing settings

Export formats

Deep dive into all export formats and customization options

DoclingDocument

Understand the unified document representation

Batch processing

Process multiple documents efficiently at scale

PDF processing

Advanced PDF features: layout, tables, formulas, and more

OCR configuration

Configure OCR engines for scanned documents and images

RAG integration

Build RAG applications with LangChain and LlamaIndex

Common patterns

Error handling

from docling.document_converter import DocumentConverter
from docling.datamodel.base_models import ConversionStatus

converter = DocumentConverter()
result = converter.convert("document.pdf")

if result.status == ConversionStatus.SUCCESS:
    print("Conversion successful!")
    markdown = result.document.export_to_markdown()
elif result.status == ConversionStatus.PARTIAL_SUCCESS:
    print("Partial success with errors:")
    for error in result.errors:
        print(f"  - {error.error_message}")
    # Still can use the document
    markdown = result.document.export_to_markdown()
else:
    print("Conversion failed")
    for error in result.errors:
        print(f"  - {error.error_message}")

Custom output directory

from pathlib import Path
from docling.document_converter import DocumentConverter

output_dir = Path("converted_docs")
output_dir.mkdir(exist_ok=True)

converter = DocumentConverter()
result = converter.convert("document.pdf")

# Save in multiple formats
base_name = result.input.file.stem
result.document.save_as_markdown(output_dir / f"{base_name}.md")
result.document.save_as_json(output_dir / f"{base_name}.json")
result.document.save_as_html(output_dir / f"{base_name}.html")

Extract specific content

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("document.pdf")
doc = result.document

# Extract all tables as CSV files
for i, table in enumerate(doc.tables):
    df = table.export_to_dataframe(doc)
    df.to_csv(f"table_{i + 1}.csv", index=False)

# Extract all images
for i, picture in enumerate(doc.pictures):
    if picture.image:
        # Access the image data
        image_data = picture.image.pil_image
        image_data.save(f"figure_{i + 1}.png")

# Get just the text content
text_only = doc.export_to_markdown(strict_text=True)
with open("content.txt", "w") as f:
    f.write(text_only)

Get help

If you run into issues:

Build docs developers (and LLMs) love