Skip to main content
Docling is available in Crew AI as the CrewDoclingSource knowledge source, enabling multi-agent AI systems to access and process document content with high fidelity.

Overview

The Crew AI Docling integration provides:
  • Document processing as a knowledge source for AI agents
  • High-fidelity extraction of content, tables, and structure
  • Seamless integration with Crew AI’s knowledge system
  • Support for multiple document formats

Installation

pip install crewai docling

Quick Start

Here’s how to use Docling as a knowledge source in Crew AI:
from crewai import Agent, Task, Crew
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource

# Create a Docling knowledge source
knowledge_source = CrewDoclingSource(
    file_paths=["document.pdf"]
)

# Create an agent with the knowledge source
agent = Agent(
    role="Research Analyst",
    goal="Analyze documents and extract key insights",
    backstory="You are an expert at analyzing complex documents.",
    knowledge_sources=[knowledge_source]
)

# Create a task
task = Task(
    description="Summarize the main findings from the document",
    agent=agent,
    expected_output="A concise summary of key findings"
)

# Create and run the crew
crew = Crew(
    agents=[agent],
    tasks=[task]
)

result = crew.kickoff()
print(result)

Multi-Agent Document Analysis

from crewai import Agent, Task, Crew
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource

# Create knowledge source
knowledge_source = CrewDoclingSource(
    file_paths=["report.pdf", "analysis.docx"]
)

# Create specialized agents
researcher = Agent(
    role="Document Researcher",
    goal="Extract and organize information from documents",
    backstory="Expert at finding relevant information in complex documents",
    knowledge_sources=[knowledge_source]
)

analyst = Agent(
    role="Data Analyst",
    goal="Analyze extracted data and identify patterns",
    backstory="Specialist in data analysis and pattern recognition",
    knowledge_sources=[knowledge_source]
)

writer = Agent(
    role="Technical Writer",
    goal="Create comprehensive reports based on analysis",
    backstory="Skilled at synthesizing complex information into clear reports"
)

# Define tasks
research_task = Task(
    description="Extract all key data points and findings from the documents",
    agent=researcher,
    expected_output="Structured list of key findings and data"
)

analysis_task = Task(
    description="Analyze the extracted data for trends and insights",
    agent=analyst,
    expected_output="Analysis report with identified trends",
    context=[research_task]
)

writing_task = Task(
    description="Write a comprehensive report combining research and analysis",
    agent=writer,
    expected_output="Final comprehensive report",
    context=[research_task, analysis_task]
)

# Create and run crew
crew = Crew(
    agents=[researcher, analyst, writer],
    tasks=[research_task, analysis_task, writing_task],
    verbose=True
)

result = crew.kickoff()

Processing Multiple Document Types

from crewai.knowledge.source.crew_docling_source import CrewDoclingSource

# Process various document formats
knowledge_source = CrewDoclingSource(
    file_paths=[
        "financial_report.pdf",
        "presentation.pptx",
        "analysis.docx",
        "data.html"
    ]
)

agent = Agent(
    role="Document Analyzer",
    goal="Analyze documents across different formats",
    backstory="Expert at processing various document types",
    knowledge_sources=[knowledge_source]
)

Features

Multi-Agent Support

Share document knowledge across multiple agents

High-Fidelity Processing

Accurate extraction of text, tables, and structure

Format Support

Process PDF, DOCX, PPTX, HTML, and more

Knowledge Integration

Seamless integration with Crew AI’s knowledge system

Use Cases

Document-Based Research

# Research agent that analyzes academic papers
knowledge_source = CrewDoclingSource(
    file_paths=["paper1.pdf", "paper2.pdf", "paper3.pdf"]
)

research_agent = Agent(
    role="Academic Researcher",
    goal="Analyze research papers and identify key contributions",
    backstory="PhD-level researcher with expertise in literature review",
    knowledge_sources=[knowledge_source]
)

task = Task(
    description="Compare methodologies across the three papers",
    agent=research_agent,
    expected_output="Comparative analysis of methodologies"
)

Compliance Analysis

# Compliance agent that checks documents against regulations
knowledge_source = CrewDoclingSource(
    file_paths=["contract.pdf", "policy.docx"]
)

compliance_agent = Agent(
    role="Compliance Officer",
    goal="Ensure documents meet regulatory requirements",
    backstory="Expert in regulatory compliance and risk assessment",
    knowledge_sources=[knowledge_source]
)

task = Task(
    description="Review documents for compliance issues",
    agent=compliance_agent,
    expected_output="Compliance assessment report"
)

Integration Benefits

1

Native Integration

Built-in support as an official Crew AI knowledge source
2

Agent Collaboration

Multiple agents can access the same document knowledge
3

Automatic Processing

Documents are processed automatically when agents need them
4

Context Preservation

Maintains document structure and context for better agent understanding

Resources

Crew AI Docs

Knowledge source documentation

GitHub

Crew AI repository

PyPI

Crew AI package

Examples

Crew AI examples

Advanced Configuration

from crewai.knowledge.source.crew_docling_source import CrewDoclingSource

# Configure Docling processing options
knowledge_source = CrewDoclingSource(
    file_paths=["document.pdf"],
    # Additional Docling options can be configured here
)

# Use with agents
agent = Agent(
    role="Document Expert",
    goal="Process documents with custom settings",
    backstory="Specialist in document processing",
    knowledge_sources=[knowledge_source],
    verbose=True
)

Next Steps

Build docs developers (and LLMs) love