LangChain Integration

Docling is available as an official LangChain extension, providing seamless integration for loading and processing documents in your LangChain applications.

Overview

The LangChain Docling integration allows you to:

Load documents with high-fidelity structural preservation
Extract tables, images, and complex layouts accurately
Convert documents to LangChain Document objects
Build RAG applications with superior document understanding

Installation

pip install langchain-docling

Quick Start

Here’s a simple example of using Docling with LangChain:

from langchain_docling import DoclingLoader

# Load a document
loader = DoclingLoader(file_path="document.pdf")
documents = loader.load()

# Use in your LangChain pipeline
for doc in documents:
    print(doc.page_content)
    print(doc.metadata)

Advanced Usage

Custom Conversion Options

from langchain_docling import DoclingLoader
from docling.document_converter import DocumentConverter
from docling.datamodel.pipeline_options import PipelineOptions

# Configure Docling options
pipeline_options = PipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.do_table_structure = True

# Create loader with custom options
loader = DoclingLoader(
    file_path="document.pdf",
    pipeline_options=pipeline_options
)

documents = loader.load()

Building a RAG Pipeline

from langchain_docling import DoclingLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# Load documents
loader = DoclingLoader(file_path="document.pdf")
documents = loader.load()

# Split documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
splits = text_splitter.split_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(splits, embeddings)

# Query
retriever = vectorstore.as_retriever()
results = retriever.get_relevant_documents("What is the main topic?")

Features

High-Fidelity Parsing

Preserves document structure including tables, lists, and headings

OCR Support

Extract text from scanned documents and images

Table Extraction

Accurately parse complex table structures

Metadata Enrichment

Includes document metadata like page numbers and structure

Supported Document Formats

The LangChain Docling integration supports:

PDF documents
Microsoft Word (DOCX)
PowerPoint (PPTX)
HTML files
Images (with OCR)
And more

Integration Benefits

Official Integration

Maintained as part of the LangChain ecosystem with full support

Easy to Use

Simple API that follows LangChain conventions

Production Ready

Battle-tested in real-world applications

Active Development

Regular updates and improvements

Resources

Documentation

Official LangChain integration docs

GitHub

Source code and examples

Tutorial

Step-by-step guide

PyPI

Package repository

Example Notebooks

For complete working examples, check out:

RAG with LangChain and Docling - End-to-end RAG application
LangChain Documentation - Official tutorial

Next Steps

Explore the LangChain documentation
Check out example notebooks
Learn about document conversion options
Build your first RAG application

Get Started

Core Concepts

Usage Guides

Advanced Features

Integrations

LangChain Integration

Overview

Installation

Quick Start

Advanced Usage

Custom Conversion Options

Building a RAG Pipeline

Features

High-Fidelity Parsing

OCR Support

Table Extraction

Metadata Enrichment

Supported Document Formats

Integration Benefits

Resources

Documentation

GitHub

Tutorial

PyPI

Example Notebooks

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage Guides

Advanced Features

Integrations

​Overview

​Installation

​Quick Start

​Advanced Usage

​Custom Conversion Options

​Building a RAG Pipeline

​Features

High-Fidelity Parsing

OCR Support

Table Extraction

Metadata Enrichment

​Supported Document Formats

​Integration Benefits

​Resources

Documentation

GitHub

Tutorial

PyPI

​Example Notebooks

​Next Steps

Build docs developers (and LLMs) love

Overview

Installation

Quick Start

Advanced Usage

Custom Conversion Options

Building a RAG Pipeline

Features

Supported Document Formats

Integration Benefits

Resources

Example Notebooks

Next Steps