Convert your first document with Docling in minutes
This guide walks you through your first document conversion using Docling. You’ll learn how to convert a PDF to Markdown with just a few lines of Python code.
Let’s start with the simplest possible example - converting a document and exporting it to Markdown.
1
Create a Python script
Create a new file called convert_document.py:
from docling.document_converter import DocumentConverter# Specify your document source (URL or local path)source = "https://arxiv.org/pdf/2408.09869"# Create a converter and process the documentconverter = DocumentConverter()result = converter.convert(source)# Export to Markdown and printprint(result.document.export_to_markdown())
2
Run the script
Execute your script:
python convert_document.py
You should see the document content in Markdown format printed to your console.
3
Understand the output
The result object contains:
result.document - The structured DoclingDocument with all content and metadata
result.status - Conversion status (SUCCESS, PARTIAL_SUCCESS, or FAILURE)
result.input - Information about the source document
Supported sources: You can use URLs, local file paths, or file-like objects as input. Docling auto-detects the format.
from pathlib import Pathfrom docling.document_converter import DocumentConverter# Use a local file pathsource = Path("/path/to/your/document.pdf")converter = DocumentConverter()result = converter.convert(source)print(result.document.export_to_markdown())
Docling supports multiple export formats. Here’s how to use them:
# Export to Markdownmarkdown_content = result.document.export_to_markdown()print(markdown_content)# Save to fileresult.document.save_as_markdown("output.md")
from docling.document_converter import DocumentConverterconverter = DocumentConverter()result = converter.convert("document.pdf")doc = result.document# Extract all tables as CSV filesfor i, table in enumerate(doc.tables): df = table.export_to_dataframe(doc) df.to_csv(f"table_{i + 1}.csv", index=False)# Extract all imagesfor i, picture in enumerate(doc.pictures): if picture.image: # Access the image data image_data = picture.image.pil_image image_data.save(f"figure_{i + 1}.png")# Get just the text contenttext_only = doc.export_to_markdown(strict_text=True)with open("content.txt", "w") as f: f.write(text_only)