Overview
The LlamaIndex Docling integration provides:- Docling Reader - Load documents with high-fidelity structural preservation
- Docling Node Parser - Parse documents into LlamaIndex nodes with structure awareness
- Lossless Serialization - Preserve complete document structure as JSON
- Flexible Export - Export to simplified formats like Markdown when needed
Installation
Components
Docling Reader
The Docling Reader loads document files and populates LlamaIndexDocument objects with Docling’s rich data model.
Basic Usage
Export Formats
Docling Node Parser
The Docling Node Parser uses knowledge of Docling’s format to intelligently parse documents into LlamaIndexNode objects for downstream usage.
Basic Usage
Advanced Parsing Options
Complete RAG Pipeline
Here’s a full example combining both components:Features
Structure-Aware
Preserves document hierarchy and relationships
Lossless Export
JSON export maintains complete document structure
Smart Chunking
Node parser respects document structure when chunking
Rich Metadata
Includes page numbers, headings, and structural information
Use Cases
Knowledge Base RAG
Table-Aware Retrieval
Integration Benefits
Resources
Tutorial
Step-by-step guide
Reader Docs
API reference for Docling Reader
Parser Docs
API reference for Node Parser
GitHub
Source code
PyPI Packages
- llama-index-readers-docling - Docling Reader component
- llama-index-node-parser-docling - Docling Node Parser component
Next Steps
- Follow the official tutorial
- Explore document conversion options
- Learn about export formats
- Build your first RAG application