Skip to main content

Overview

LLM Magic provides multiple extraction strategies to handle different document sizes and complexity levels. Each strategy balances speed, cost, and accuracy differently.

Available Strategies

Access the list of available strategies:
use Mateffy\Magic;

$strategies = Magic::getExtractionStrategies();
// Returns: Collection<string, class-string<Strategy>>
Built-in strategies:
  • simple - Single-pass extraction
  • sequential - Sequential batch processing
  • sequential-auto-merge - Sequential with automatic merging
  • parallel - Parallel batch processing
  • parallel-auto-merge - Parallel with automatic merging
  • double-pass - Two-pass extraction for maximum accuracy
  • double-pass-auto-merge - Two-pass with automatic merging

Simple Strategy

The simplest and fastest strategy. Processes only the first batch of artifacts.
use Mateffy\Magic;

$data = Magic::extract()
    ->schema([
        'type' => 'object',
        'properties' => [
            'title' => ['type' => 'string'],
            'summary' => ['type' => 'string'],
        ]
    ])
    ->artifacts([$document])
    ->strategy('simple')
    ->send();
Best for: Short documents, quick summaries, single-page PDFsPros: Fastest, lowest costCons: Only processes first batch of artifacts

Sequential Strategy

Processes artifacts in batches sequentially, passing previous data to the next batch for context.
$data = Magic::extract()
    ->schema($schema)
    ->artifacts($documents)
    ->strategy('sequential')
    ->send();
Best for: Long documents, maintaining context across pagesPros: Maintains context, processes all artifactsCons: Slower than parallel, sequential processing

Parallel Strategy

Processes batches concurrently, then merges results using an LLM.
$data = Magic::extract()
    ->schema($schema)
    ->artifacts($documents)
    ->strategy('parallel')
    ->concurrency(5)  // Process 5 batches at once
    ->send();
Best for: Large documents where speed is criticalPros: Fastest for large documents, processes all artifactsCons: Higher cost, requires merge step, may lose some context

Double-Pass Strategy

Performs two passes: first parallel for broad coverage, then sequential for detail and accuracy.
$data = Magic::extract()
    ->schema($schema)
    ->artifacts($documents)
    ->strategy('double-pass')
    ->send();
Best for: Complex documents requiring high accuracyPros: Highest accuracy, catches details missed in first passCons: Highest cost, slowest processing time

Auto-Merge Variants

Strategies ending in -auto-merge skip the LLM merge step and use automatic data merging:
// Uses SmartDataMerger instead of LLM merge
->strategy('parallel-auto-merge')
Auto-merge variants are faster and cheaper but may produce less coherent results for complex schemas.

Registering Custom Strategies

Extend the extraction system with custom strategies:
use Mateffy\Magic;
use App\Extraction\MyCustomStrategy;

Magic::registerStrategy('my-custom', MyCustomStrategy::class);

// Use it
Magic::extract()
    ->strategy('my-custom')
    ->send();
Your custom strategy must extend Extractor:
use Mateffy\Magic\Extraction\Strategies\Extractor;

class MyCustomStrategy extends Extractor
{
    public function run(array $artifacts): array
    {
        // Your implementation
    }

    public static function getLabel(): string
    {
        return 'My Custom Strategy';
    }

    public function getEstimatedSteps(array $artifacts): int
    {
        return 1;
    }
}

Strategy Comparison

Simple

Speed: FastestCost: LowestAccuracy: BasicUse case: Short documents

Sequential

Speed: ModerateCost: ModerateAccuracy: GoodUse case: Long documents

Parallel

Speed: FastCost: HigherAccuracy: GoodUse case: Large documents

Double-Pass

Speed: SlowestCost: HighestAccuracy: BestUse case: Complex extraction

Performance Tips

Adjust chunk size based on your model’s context window:
->chunkSize(100000)  // For larger context models
Control how many batches process simultaneously:
->concurrency(10)  // Process up to 10 batches at once
Track extraction progress with callbacks:
->onDataProgress(function (array $data) {
    Log::info('Extraction progress', ['data' => $data]);
})

Build docs developers (and LLMs) love