Strategy interface
All strategies implement theExtractionStrategy<T> interface:
- Chunking artifacts into token budgets
- Building prompts for extraction and merging
- Running LLM calls with validation retries
- Merging or deduplicating results
- Emitting progress events via
onStep
Available strategies
Struktur provides seven built-in strategies:Simple
Single-shot extraction for small inputs that fit in one LLM call.- Build extraction prompt with all artifacts
- Run single LLM call
- Validate and return
Parallel
Concurrent batch processing with LLM-based merge.- Split artifacts into batches based on
chunkSize - Process batches concurrently (up to
concurrencylimit) - Merge all results using LLM with merge prompt
model: Base extraction modelmergeModel: Model for merging batch resultschunkSize: Token budget per batchconcurrency: Max parallel batches (default: all batches)maxImages: Optional image limit per batchoutputInstructions: Extra system instructions
Sequential
Processes batches in order, passing context between chunks.- Split artifacts into batches
- Process first batch
- For each subsequent batch, pass previous result as context
- Return final accumulated result
Parallel auto-merge
Concurrent processing with schema-aware merge and hash-based deduplication.- Process batches concurrently
- Schema-aware merge (arrays concatenate, objects merge, scalars prefer new)
- Hash-based exact duplicate removal
- LLM deduplication pass to find semantic duplicates
SmartDataMerger:
Sequential auto-merge
Sequential processing with auto-merge and deduplication.Double pass
Two-phase extraction: parallel first pass, sequential refinement.- Pass 1: Parallel extraction and merge (like
parallel) - Pass 2: Sequential refinement through all batches with merged context
batches.length * 2 + 3
Double pass auto-merge
Combines double-pass with auto-merge deduplication.- Parallel auto-merge (extract + merge + dedupe)
- Sequential refinement pass
Common configuration
Most strategies share these options:- model: Base extraction model (required)
- chunkSize: Token budget per batch (required for non-simple strategies)
- maxImages: Optional image limit per batch
- outputInstructions: Extra instructions appended to system prompt
- strict: Enable strict schema validation (for compatible models)
- execute: Custom retry executor (for testing)
- mergeModel: Model for LLM-based merging
- dedupeModel: Model for deduplication (defaults to
model) - dedupeExecute: Custom executor for dedupe pass
Progress tracking
Strategies emitonStep events for progress tracking:
getEstimatedSteps(artifacts):
Choosing a strategy
Consider order
If document order matters (narratives, chronological data), use
sequential or sequentialAutoMerge.Evaluate speed vs quality
- For speed:
parallelorparallelAutoMerge - For quality:
doublePassordoublePassAutoMerge
Custom strategies
You can implement custom strategies by following theExtractionStrategy<T> interface:
src/strategies/ for implementation patterns.