Skip to main content

Overview

Table structure options configure how Docling extracts and reconstructs table structures from documents. The TableFormer model analyzes table layouts, identifies cells, rows, and columns, and reconstructs the logical structure.

TableStructureOptions

Configuration for table structure extraction using the TableFormer model.
from docling.datamodel.pipeline_options import TableStructureOptions, TableFormerMode

options = TableStructureOptions(
    do_cell_matching=True,
    mode=TableFormerMode.ACCURATE
)

Parameters

kind
Literal['docling_tableformer']
default:"'docling_tableformer'"
Model type identifier. Always set to "docling_tableformer" for TableFormer-based extraction.
do_cell_matching
bool
default:"True"
Enable cell matching to align detected table cells with their content.When enabled, the model attempts to match table structure predictions with actual cell content for improved accuracy. This ensures that detected cell boundaries correctly correspond to the text or data within them.
mode
TableFormerMode
default:"TableFormerMode.ACCURATE"
Table structure extraction mode. Controls the trade-off between processing speed and extraction accuracy.Options:
  • TableFormerMode.ACCURATE - Higher quality results with slower processing. Recommended for production use.
  • TableFormerMode.FAST - Prioritizes speed over precision. Suitable for simple tables or high-volume processing.
Choose based on your performance requirements and document complexity.

TableFormerMode

Operating modes for TableFormer table structure extraction model.
from docling.datamodel.pipeline_options import TableFormerMode

# Use accurate mode (recommended)
mode = TableFormerMode.ACCURATE

# Use fast mode for high-volume processing
mode = TableFormerMode.FAST

Values

FAST
str
Fast mode prioritizes speed over precision. Suitable for simple tables or high-volume processing where some accuracy can be traded for performance.
ACCURATE
str
Accurate mode provides higher quality results with slower processing. Recommended for complex tables and production use where accuracy is critical.

Usage

Basic Configuration

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import (
    PdfPipelineOptions,
    TableStructureOptions,
    TableFormerMode
)

# Configure table structure extraction
table_options = TableStructureOptions(
    do_cell_matching=True,
    mode=TableFormerMode.ACCURATE
)

pipeline_options = PdfPipelineOptions(
    do_table_structure=True,
    table_structure_options=table_options
)

converter = DocumentConverter(
    format_options={
        PdfFormatOption: PdfFormatOption(pipeline_options=pipeline_options)
    }
)

result = converter.convert("document_with_tables.pdf")

Fast Mode for High-Volume Processing

from docling.datamodel.pipeline_options import (
    PdfPipelineOptions,
    TableStructureOptions,
    TableFormerMode
)

# Fast mode for processing many simple documents
table_options = TableStructureOptions(
    mode=TableFormerMode.FAST
)

pipeline_options = PdfPipelineOptions(
    do_table_structure=True,
    table_structure_options=table_options
)

Disable Cell Matching

For very simple tables where cell boundaries are well-defined:
table_options = TableStructureOptions(
    do_cell_matching=False,  # Skip cell matching for speed
    mode=TableFormerMode.FAST
)

Performance Considerations

Best for:
  • Complex tables with merged cells, nested structures
  • Financial documents, spreadsheets, scientific papers
  • Production environments where accuracy is critical
Trade-offs:
  • Slower processing (2-3x compared to FAST mode)
  • Higher memory usage
  • Better handling of edge cases
Best for:
  • Simple, well-structured tables
  • High-volume batch processing
  • Preview or draft conversions
  • Documents with limited table complexity
Trade-offs:
  • Faster processing
  • Lower memory footprint
  • May miss complex cell relationships

Table Structure Output

Table structure extraction produces TableItem objects with detailed cell information:
result = converter.convert("document.pdf")

for table_item, _ in result.document.iterate_items(traverse_pictures=False):
    if isinstance(table_item, TableItem):
        data = table_item.data
        print(f"Table: {data.num_rows} rows × {data.num_cols} cols")
        
        for cell in data.table_cells:
            print(f"Cell ({cell.start_row_offset_idx},{cell.start_col_offset_idx}): {cell.text}")
            print(f"  Spans: {cell.row_span} rows × {cell.col_span} cols")
            print(f"  Header: column={cell.column_header}, row={cell.row_header}")

Cell Matching Details

When do_cell_matching=True, the TableFormer model:
  1. Detects table structure - Identifies row and column boundaries
  2. Locates cell content - Finds text regions within detected cells
  3. Aligns content to cells - Matches text to the correct cell boundaries
  4. Handles merged cells - Correctly identifies cells spanning multiple rows/columns
  5. Validates structure - Ensures consistency between structure and content
This multi-step process improves accuracy but increases processing time.

BaseTableStructureOptions

Abstract base class for all table structure extraction models. Currently, TableStructureOptions is the only concrete implementation using the TableFormer model.
from docling.datamodel.pipeline_options import BaseTableStructureOptions

# Use concrete implementation
from docling.datamodel.pipeline_options import TableStructureOptions

Troubleshooting

Ensure do_table_structure=True in your pipeline options:
pipeline_options = PdfPipelineOptions(
    do_table_structure=True,  # Must be enabled
    table_structure_options=TableStructureOptions()
)
Try ACCURATE mode with cell matching enabled:
table_options = TableStructureOptions(
    do_cell_matching=True,
    mode=TableFormerMode.ACCURATE
)
For faster processing on simple tables:
table_options = TableStructureOptions(
    mode=TableFormerMode.FAST
)
Reduce batch size for table processing:
pipeline_options = PdfPipelineOptions(
    do_table_structure=True,
    table_structure_batch_size=2  # Smaller batches
)

See Also

Build docs developers (and LLMs) love