Skip to main content

Overview

Confidence scores provide quantitative assessment of document conversion quality, helping you:
  • Identify documents requiring manual review
  • Adjust conversion pipelines based on quality metrics
  • Set confidence thresholds for automated workflows
  • Catch potential conversion issues early
Confidence grades were introduced in v2.34.0 and are available in the confidence field of ConversionResult. Source: ~/workspace/source/docs/concepts/confidence_scores.md:1

Quick Start

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("document.pdf")

# Check overall quality
print(f"Mean Grade: {result.confidence.mean_grade}")
print(f"Low Grade: {result.confidence.low_grade}")

# Check component scores
print(f"Layout: {result.confidence.layout_grade}")
print(f"OCR: {result.confidence.ocr_grade}")

# Review page-level confidence
for page_conf in result.confidence.pages:
    print(f"Page {page_conf.page_no}: {page_conf.mean_grade}")

Purpose and Use Cases

Source: ~/workspace/source/docs/concepts/confidence_scores.md:6 Complex layouts, poor scan quality, or challenging formatting can lead to suboptimal conversion results. Confidence scores help you:

Quality Assurance

Identify documents that may need manual review after conversion

Pipeline Optimization

Adjust conversion pipelines to the most appropriate for each document type

Threshold Setting

Set confidence thresholds for unattended batch conversions

Early Detection

Catch potential conversion issues early in your workflow

Scores and Grades

Numerical Scores

Source: ~/workspace/source/docs/concepts/confidence_scores.md:24 Scores are numerical values between 0.0 and 1.0, where higher values indicate better conversion quality.
Scores are primarily for internal use. Their computation and weighting may change in future releases. Use grades for decision-making.

Quality Grades

Source: ~/workspace/source/docs/concepts/confidence_scores.md:24 Grades are categorical quality assessments based on score thresholds:
GradeMeaningRecommended Action
EXCELLENTVery high quality conversionUse as-is
GOODReliable conversion qualitySafe for most use cases
FAIRAcceptable but may have issuesReview if accuracy is critical
POORLow quality conversionManual review recommended
from docling_core.types.doc import ConfidenceGrade

if result.confidence.mean_grade == ConfidenceGrade.POOR:
    print("⚠️ This document may need manual review")
elif result.confidence.mean_grade >= ConfidenceGrade.GOOD:
    print("✅ High quality conversion")
Focus on quality grades! Users should rely on document-level grade fields (mean_grade and low_grade) to assess overall conversion quality.

Component Confidence Scores

Source: ~/workspace/source/docs/concepts/confidence_scores.md:36 Each confidence report includes four component scores and grades:

Layout Score

Overall quality of document element recognition
  • Measures how well document structure was detected
  • Includes paragraphs, headings, lists, tables, figures
  • Based on model prediction confidence
print(f"Layout Quality: {result.confidence.layout_grade}")
print(f"Layout Score: {result.confidence.layout_score:.2f}")

OCR Score

Quality of OCR-extracted content
  • Evaluates text recognition quality from scanned pages
  • Only relevant for documents requiring OCR
  • Higher scores indicate more confident character recognition
print(f"OCR Quality: {result.confidence.ocr_grade}")
print(f"OCR Score: {result.confidence.ocr_score:.2f}")

Parse Score

10th percentile score of digital text cells
  • Emphasizes problem areas in text extraction
  • Based on text cell-level confidence
  • Highlights worst-performing regions
print(f"Parse Quality: {result.confidence.parse_grade}")
print(f"Parse Score: {result.confidence.parse_score:.2f}")

Table Score

Table extraction quality
Table confidence scoring is not yet implemented. This field is reserved for future use.
print(f"Table Quality: {result.confidence.table_grade}")
print(f"Table Score: {result.confidence.table_score:.2f}")

Summary Grades

Source: ~/workspace/source/docs/concepts/confidence_scores.md:44 Two aggregate grades provide overall document quality assessment:

Mean Grade

Average of the four component scores
  • Provides overall quality assessment
  • Balances all aspects of conversion
  • Recommended for most use cases
if result.confidence.mean_grade >= ConfidenceGrade.GOOD:
    # High confidence - proceed with automated processing
    process_automatically(result.document)
else:
    # Lower confidence - queue for review
    queue_for_review(result.document)

Low Grade

5th percentile score (highlights worst-performing areas)
  • Emphasizes problematic regions
  • More conservative quality metric
  • Useful for critical applications
if result.confidence.low_grade == ConfidenceGrade.POOR:
    # Worst areas are poor quality
    print("Document has significant quality issues")

Page-Level vs Document-Level

Source: ~/workspace/source/docs/concepts/confidence_scores.md:52 Confidence grades are calculated at two levels:

Document-Level Confidence

Overall scores and grades for the entire document:
# Access document-level confidence
doc_confidence = result.confidence

print(f"Document Quality: {doc_confidence.mean_grade}")
print(f"Layout: {doc_confidence.layout_score:.2f}")
print(f"OCR: {doc_confidence.ocr_score:.2f}")
print(f"Parse: {doc_confidence.parse_score:.2f}")
Document-level scores are calculated as averages of page-level scores.

Page-Level Confidence

Individual scores and grades for each page:
# Access page-level confidence
for page_conf in result.confidence.pages:
    print(f"\nPage {page_conf.page_no}:")
    print(f"  Mean Grade: {page_conf.mean_grade}")
    print(f"  Low Grade: {page_conf.low_grade}")
    print(f"  Layout: {page_conf.layout_grade}")
    print(f"  OCR: {page_conf.ocr_grade}")
    print(f"  Parse: {page_conf.parse_grade}")

Identifying Problematic Pages

# Find pages with quality issues
problematic_pages = [
    page_conf.page_no
    for page_conf in result.confidence.pages
    if page_conf.mean_grade == ConfidenceGrade.POOR
]

if problematic_pages:
    print(f"Pages needing review: {problematic_pages}")

Practical Examples

Quality-Based Workflow Routing

from docling.document_converter import DocumentConverter
from docling_core.types.doc import ConfidenceGrade

def process_document_with_routing(file_path):
    converter = DocumentConverter()
    result = converter.convert(file_path)
    
    mean_grade = result.confidence.mean_grade
    
    if mean_grade == ConfidenceGrade.EXCELLENT:
        # High quality - automated processing
        return "auto_process", result.document
    
    elif mean_grade == ConfidenceGrade.GOOD:
        # Good quality - spot check recommended
        return "spot_check", result.document
    
    elif mean_grade == ConfidenceGrade.FAIR:
        # Fair quality - human review recommended
        return "human_review", result.document
    
    else:  # POOR
        # Poor quality - full manual processing
        return "manual_process", result.document

# Use in workflow
routing, document = process_document_with_routing("document.pdf")
print(f"Route to: {routing}")

Batch Processing with Thresholds

from pathlib import Path
from docling.document_converter import DocumentConverter
from docling_core.types.doc import ConfidenceGrade

def batch_convert_with_quality_check(input_dir, min_grade=ConfidenceGrade.GOOD):
    converter = DocumentConverter()
    
    results = {
        "processed": [],
        "review_needed": [],
        "failed": []
    }
    
    for pdf_file in Path(input_dir).glob("*.pdf"):
        try:
            result = converter.convert(str(pdf_file))
            
            if result.confidence.mean_grade >= min_grade:
                results["processed"].append({
                    "file": pdf_file.name,
                    "grade": result.confidence.mean_grade,
                    "document": result.document
                })
            else:
                results["review_needed"].append({
                    "file": pdf_file.name,
                    "grade": result.confidence.mean_grade,
                    "low_grade": result.confidence.low_grade,
                    "document": result.document
                })
        
        except Exception as e:
            results["failed"].append({
                "file": pdf_file.name,
                "error": str(e)
            })
    
    return results

# Process directory
results = batch_convert_with_quality_check(
    "./documents",
    min_grade=ConfidenceGrade.GOOD
)

print(f"Processed: {len(results['processed'])}")
print(f"Review needed: {len(results['review_needed'])}")
print(f"Failed: {len(results['failed'])}")

Component-Specific Analysis

from docling.document_converter import DocumentConverter
from docling_core.types.doc import ConfidenceGrade

def analyze_conversion_quality(file_path):
    converter = DocumentConverter()
    result = converter.convert(file_path)
    
    conf = result.confidence
    
    print(f"\n📄 Quality Report: {file_path}")
    print("=" * 50)
    print(f"Overall Mean Grade: {conf.mean_grade}")
    print(f"Overall Low Grade: {conf.low_grade}")
    print()
    print("Component Breakdown:")
    print(f"  Layout: {conf.layout_grade} (score: {conf.layout_score:.3f})")
    print(f"  OCR: {conf.ocr_grade} (score: {conf.ocr_score:.3f})")
    print(f"  Parse: {conf.parse_grade} (score: {conf.parse_score:.3f})")
    print(f"  Table: {conf.table_grade} (score: {conf.table_score:.3f})")
    
    # Identify weak areas
    weak_components = []
    if conf.layout_grade < ConfidenceGrade.GOOD:
        weak_components.append("layout detection")
    if conf.ocr_grade < ConfidenceGrade.GOOD:
        weak_components.append("OCR quality")
    if conf.parse_grade < ConfidenceGrade.GOOD:
        weak_components.append("text parsing")
    
    if weak_components:
        print(f"\n⚠️ Weak areas: {', '.join(weak_components)}")
    else:
        print("\n✅ All components meet quality threshold")
    
    # Page-level analysis
    poor_pages = [
        p.page_no for p in conf.pages
        if p.mean_grade == ConfidenceGrade.POOR
    ]
    
    if poor_pages:
        print(f"\n🔍 Pages needing attention: {poor_pages}")
    
    return result

analyze_conversion_quality("complex_document.pdf")

Export Quality Report

import json
from docling.document_converter import DocumentConverter

def export_quality_report(file_path, output_json):
    converter = DocumentConverter()
    result = converter.convert(file_path)
    
    report = {
        "document": file_path,
        "overall": {
            "mean_grade": str(result.confidence.mean_grade),
            "low_grade": str(result.confidence.low_grade),
        },
        "components": {
            "layout": {
                "grade": str(result.confidence.layout_grade),
                "score": result.confidence.layout_score,
            },
            "ocr": {
                "grade": str(result.confidence.ocr_grade),
                "score": result.confidence.ocr_score,
            },
            "parse": {
                "grade": str(result.confidence.parse_grade),
                "score": result.confidence.parse_score,
            },
            "table": {
                "grade": str(result.confidence.table_grade),
                "score": result.confidence.table_score,
            },
        },
        "pages": [
            {
                "page_no": p.page_no,
                "mean_grade": str(p.mean_grade),
                "low_grade": str(p.low_grade),
                "layout_grade": str(p.layout_grade),
                "ocr_grade": str(p.ocr_grade),
                "parse_grade": str(p.parse_grade),
            }
            for p in result.confidence.pages
        ]
    }
    
    with open(output_json, 'w') as f:
        json.dump(report, f, indent=2)
    
    print(f"Quality report exported to {output_json}")
    return report

export_quality_report("document.pdf", "quality_report.json")

Interpretation Guidelines

  • Mean grade is GOOD or EXCELLENT
  • Low grade is at least FAIR
  • All component grades are FAIR or better
  • Use case allows for minor errors
  • Mean grade is FAIR
  • Low grade is POOR (indicates problem areas)
  • Critical components (layout, OCR) have low grades
  • High-accuracy requirements
  • Mean grade is POOR
  • Multiple components have POOR grades
  • Try alternative pipelines or preprocessing
  • Consider manual data entry for critical documents
  • Low OCR grades often indicate:
    • Poor scan quality
    • Unusual fonts or handwriting
    • Complex layouts interfering with text detection
  • Consider image preprocessing or different OCR engines

Visualization Example

Source: ~/workspace/source/docs/concepts/confidence_scores.md:60 Confidence Scores Visualization Example visualization showing document-level and page-level confidence grades

Best Practices

Use Grades, Not Scores

Focus on categorical grades for decision-making. Numerical scores may change between versions.

Set Context-Appropriate Thresholds

Critical applications may require EXCELLENT grade, while general use can accept GOOD.

Monitor Component Scores

Track which components typically cause issues to improve preprocessing.

Page-Level Granularity

Use page-level confidence to identify specific problem areas in large documents.

Limitations

  • Confidence scores reflect model certainty, not absolute accuracy
  • Table scoring is not yet implemented
  • Score calculation may evolve in future versions
  • High confidence doesn’t guarantee 100% accuracy
  • Low confidence doesn’t always mean poor results

Pipeline Selection

Choose the right pipeline for quality

Quality Optimization

Improve conversion quality

Batch Processing

Process multiple documents efficiently

Error Handling

Handle conversion errors gracefully

Build docs developers (and LLMs) love