Extraction Strategies

Overview

LLM Magic provides multiple extraction strategies to handle different document sizes and complexity levels. Each strategy balances speed, cost, and accuracy differently.

Available Strategies

Access the list of available strategies:

use Mateffy\Magic;

$strategies = Magic::getExtractionStrategies();
// Returns: Collection<string, class-string<Strategy>>

Built-in strategies:

simple - Single-pass extraction
sequential - Sequential batch processing
sequential-auto-merge - Sequential with automatic merging
parallel - Parallel batch processing
parallel-auto-merge - Parallel with automatic merging
double-pass - Two-pass extraction for maximum accuracy
double-pass-auto-merge - Two-pass with automatic merging

Simple Strategy

The simplest and fastest strategy. Processes only the first batch of artifacts.

use Mateffy\Magic;

$data = Magic::extract()
    ->schema([
        'type' => 'object',
        'properties' => [
            'title' => ['type' => 'string'],
            'summary' => ['type' => 'string'],
        ]
    ])
    ->artifacts([$document])
    ->strategy('simple')
    ->send();

Best for: Short documents, quick summaries, single-page PDFsPros: Fastest, lowest costCons: Only processes first batch of artifacts

Sequential Strategy

Processes artifacts in batches sequentially, passing previous data to the next batch for context.

$data = Magic::extract()
    ->schema($schema)
    ->artifacts($documents)
    ->strategy('sequential')
    ->send();

Best for: Long documents, maintaining context across pagesPros: Maintains context, processes all artifactsCons: Slower than parallel, sequential processing

Parallel Strategy

Processes batches concurrently, then merges results using an LLM.

$data = Magic::extract()
    ->schema($schema)
    ->artifacts($documents)
    ->strategy('parallel')
    ->concurrency(5)  // Process 5 batches at once
    ->send();

Best for: Large documents where speed is criticalPros: Fastest for large documents, processes all artifactsCons: Higher cost, requires merge step, may lose some context

Double-Pass Strategy

Performs two passes: first parallel for broad coverage, then sequential for detail and accuracy.

$data = Magic::extract()
    ->schema($schema)
    ->artifacts($documents)
    ->strategy('double-pass')
    ->send();

Best for: Complex documents requiring high accuracyPros: Highest accuracy, catches details missed in first passCons: Highest cost, slowest processing time

Auto-Merge Variants

Strategies ending in -auto-merge skip the LLM merge step and use automatic data merging:

// Uses SmartDataMerger instead of LLM merge
->strategy('parallel-auto-merge')

Auto-merge variants are faster and cheaper but may produce less coherent results for complex schemas.

Registering Custom Strategies

Extend the extraction system with custom strategies:

use Mateffy\Magic;
use App\Extraction\MyCustomStrategy;

Magic::registerStrategy('my-custom', MyCustomStrategy::class);

// Use it
Magic::extract()
    ->strategy('my-custom')
    ->send();

Your custom strategy must extend Extractor:

use Mateffy\Magic\Extraction\Strategies\Extractor;

class MyCustomStrategy extends Extractor
{
    public function run(array $artifacts): array
    {
        // Your implementation
    }

    public static function getLabel(): string
    {
        return 'My Custom Strategy';
    }

    public function getEstimatedSteps(array $artifacts): int
    {
        return 1;
    }
}

Strategy Comparison

Simple

Speed: FastestCost: LowestAccuracy: BasicUse case: Short documents

Sequential

Speed: ModerateCost: ModerateAccuracy: GoodUse case: Long documents

Parallel

Speed: FastCost: HigherAccuracy: GoodUse case: Large documents

Double-Pass

Speed: SlowestCost: HighestAccuracy: BestUse case: Complex extraction

Performance Tips

Choosing Chunk Size

Adjust chunk size based on your model’s context window:

->chunkSize(100000)  // For larger context models

Concurrency Settings

Control how many batches process simultaneously:

->concurrency(10)  // Process up to 10 batches at once

Monitoring Progress

Track extraction progress with callbacks:

->onDataProgress(function (array $data) {
    Log::info('Extraction progress', ['data' => $data]);
})

Getting Started

Core Features

Advanced

Guides

Extraction Strategies

Overview

Available Strategies

Simple Strategy

Sequential Strategy

Parallel Strategy

Double-Pass Strategy

Auto-Merge Variants

Registering Custom Strategies

Strategy Comparison

Simple

Sequential

Parallel

Double-Pass

Performance Tips

Build docs developers (and LLMs) love

Getting Started

Core Features

Advanced

Guides

​Overview

​Available Strategies

​Simple Strategy

​Sequential Strategy

​Parallel Strategy

​Double-Pass Strategy

​Auto-Merge Variants

​Registering Custom Strategies

​Strategy Comparison

Simple

Sequential

Parallel

Double-Pass

​Performance Tips

Build docs developers (and LLMs) love

Overview

Available Strategies

Simple Strategy

Sequential Strategy

Parallel Strategy

Double-Pass Strategy

Auto-Merge Variants

Registering Custom Strategies

Strategy Comparison

Performance Tips