Why chunking matters
LLMs have context window limits (e.g., 128K tokens for GPT-4). When artifacts exceed these limits, Struktur must:- Split individual artifacts into smaller parts
- Batch multiple artifact parts together up to the token budget
- Process batches according to the chosen strategy
Two-phase process
Artifact splitting
Large artifacts are split into parts using
ArtifactSplitter. Each part respects token and image limits.Artifact splitting
TheArtifactSplitter divides artifacts based on their contents array:
How splitting works
Split oversized text
If a content block’s text exceeds Text is split by character count using a token ratio (default: 4 chars/token).
maxTokens, split it into chunks:Group contents into parts
Combine content blocks into artifact parts, respecting token and image budgets:
Split artifact structure
Split artifacts maintain the original structure:Batch creation
TheArtifactBatcher groups split artifacts into batches:
Batching algorithm
- Model max tokens: Respects
modelMaxTokensif provided (uses minimum of user limit and model limit) - Greedy packing: Adds artifacts to current batch until limits are exceeded
- Automatic splitting: Calls
splitArtifactinternally for oversized artifacts - Image limits: Respects optional
maxImagesper batch
Token counting
Struktur estimates token counts using a configurable ratio:Counting artifact tokens
Strategy integration
Strategies use a helper to create batches:ParallelStrategySequentialStrategyParallelAutoMergeStrategySequentialAutoMergeStrategyDoublePassStrategyDoublePassAutoMergeStrategy
SimpleStrategy does not chunk—it processes all artifacts in a single call.
Configuration options
Batch options
Strategy-level configuration
Best practices
Set chunkSize based on model limits
Set chunkSize based on model limits
Leave headroom for prompts and schema:
Use maxImages for vision models
Use maxImages for vision models
Vision models have image limits (e.g., 10 images per call for GPT-4V):
Adjust textTokenRatio for accuracy
Adjust textTokenRatio for accuracy
If you have precise token counts (e.g., from tiktoken), adjust the ratio:
Pre-compute artifact tokens
Pre-compute artifact tokens
For repeated extractions, pre-compute and cache token counts:If
artifact.tokens is set, Struktur uses it instead of estimating.Monitoring chunking
Strategies emit progress events showing batch counts:Example: Large PDF extraction
- Splits the PDF into parts fitting 50K tokens
- Batches parts together
- Processes batches in parallel (pass 1)
- Merges results
- Refines sequentially (pass 2)