Overview
Table structure options configure how Docling extracts and reconstructs table structures from documents. The TableFormer model analyzes table layouts, identifies cells, rows, and columns, and reconstructs the logical structure.TableStructureOptions
Configuration for table structure extraction using the TableFormer model.Parameters
Model type identifier. Always set to
"docling_tableformer" for TableFormer-based extraction.Enable cell matching to align detected table cells with their content.When enabled, the model attempts to match table structure predictions with actual cell content for improved accuracy. This ensures that detected cell boundaries correctly correspond to the text or data within them.
Table structure extraction mode. Controls the trade-off between processing speed and extraction accuracy.Options:
TableFormerMode.ACCURATE- Higher quality results with slower processing. Recommended for production use.TableFormerMode.FAST- Prioritizes speed over precision. Suitable for simple tables or high-volume processing.
TableFormerMode
Operating modes for TableFormer table structure extraction model.Values
Fast mode prioritizes speed over precision. Suitable for simple tables or high-volume processing where some accuracy can be traded for performance.
Accurate mode provides higher quality results with slower processing. Recommended for complex tables and production use where accuracy is critical.
Usage
Basic Configuration
Fast Mode for High-Volume Processing
Disable Cell Matching
For very simple tables where cell boundaries are well-defined:Performance Considerations
ACCURATE Mode
ACCURATE Mode
Best for:
- Complex tables with merged cells, nested structures
- Financial documents, spreadsheets, scientific papers
- Production environments where accuracy is critical
- Slower processing (2-3x compared to FAST mode)
- Higher memory usage
- Better handling of edge cases
FAST Mode
FAST Mode
Best for:
- Simple, well-structured tables
- High-volume batch processing
- Preview or draft conversions
- Documents with limited table complexity
- Faster processing
- Lower memory footprint
- May miss complex cell relationships
Table Structure Output
Table structure extraction producesTableItem objects with detailed cell information:
Cell Matching Details
Whendo_cell_matching=True, the TableFormer model:
- Detects table structure - Identifies row and column boundaries
- Locates cell content - Finds text regions within detected cells
- Aligns content to cells - Matches text to the correct cell boundaries
- Handles merged cells - Correctly identifies cells spanning multiple rows/columns
- Validates structure - Ensures consistency between structure and content
BaseTableStructureOptions
Abstract base class for all table structure extraction models. Currently,TableStructureOptions is the only concrete implementation using the TableFormer model.
Troubleshooting
Tables not detected
Tables not detected
Ensure
do_table_structure=True in your pipeline options:Incorrect cell boundaries
Incorrect cell boundaries
Try ACCURATE mode with cell matching enabled:
Slow processing
Slow processing
For faster processing on simple tables:
Memory issues
Memory issues
Reduce batch size for table processing:
See Also
- Pipeline Options - Pipeline configuration
- PDF Processing - Table extraction guide
- DoclingDocument - Document data structure