sequentialAutoMerge() strategy processes document chunks sequentially, automatically merges results using schema-aware logic as it goes, removes exact duplicates, and uses an LLM to identify and remove semantic duplicates.
Usage
Configuration
The AI SDK language model to use for extraction.
Maximum tokens per chunk. Documents are split into batches that fit within this limit.
Maximum number of images per chunk. Useful for controlling vision API costs.
Additional instructions to guide the model’s output format or behavior.
The AI SDK language model to use for semantic deduplication. Defaults to the extraction
model.Custom retry executor function for extraction. Defaults to
runWithRetries.Custom retry executor function for deduplication. Defaults to
runWithRetries.Enable strict mode for structured output validation. Defaults to
false.When to use
- You have large documents with potential duplicate data
- Sequential processing is important for your use case
- You don’t want to write custom merge logic
- You want automatic deduplication
How it works
- Sequential extraction: Processes chunks one at a time
- Incremental merge: Uses
SmartDataMergerto combine each result as it arrives - Hash-based deduplication: Removes exact duplicates using hash comparison
- LLM deduplication: Uses an LLM to identify semantic duplicates and returns paths to remove
Trade-offs
Advantages:- No custom merge logic needed
- Automatic duplicate removal
- Schema-aware merging
- Lower peak memory usage than parallel strategies
- Slower than parallel strategies (no concurrency)
- Higher token usage than basic sequential (dedupe step)
- Less control over merge strategy
Performance characteristics
The strategy estimatesbatches.length + 3 steps:
- Prepare
- Extract from batch 1 through N (sequential)
- Dedupe
- Complete