Skip to main content
The ExtractionLLMBuilder provides a fluent interface for extracting structured data from documents using different extraction strategies. It’s designed for processing large documents and extracting specific information according to a schema.

Configuration Methods

model

Set the LLM model to use for extraction.
$builder->model('gpt-4o')
model
string|LLM
required
The model identifier string (e.g., ‘gpt-4o’, ‘claude-3-5-sonnet’) or LLM instance
return
static
Returns the builder instance for method chaining

schema

Set the JSON schema for the data to extract.
$builder->schema([
    'type' => 'object',
    'properties' => [
        'invoice_number' => ['type' => 'string'],
        'total_amount' => ['type' => 'number'],
        'items' => [
            'type' => 'array',
            'items' => [
                'type' => 'object',
                'properties' => [
                    'name' => ['type' => 'string'],
                    'price' => ['type' => 'number']
                ]
            ]
        ]
    ]
])
schema
array
required
JSON schema array defining the structure of data to extract
return
static
Returns the builder instance for method chaining

strategy

Set the extraction strategy to use.
$builder->strategy('parallel-auto-merge')
strategy
string|null
required
Strategy name: ‘simple’, ‘sequential’, ‘sequential-auto-merge’, ‘parallel’, ‘parallel-auto-merge’, ‘double-pass’, ‘double-pass-auto-merge’, or a registered custom strategy
return
static
Returns the builder instance for method chaining

instructions

Set custom output instructions for the extraction.
$builder->instructions('Extract all financial data with high precision')
instructions
string|null
required
Additional instructions to guide the extraction process
return
static
Returns the builder instance for method chaining

Artifact Methods

file

Add a file to extract data from.
$builder->file('path/to/document.pdf')
$builder->file('path/to/invoice.pdf', disk: 's3', replace: false)
path
string
required
Path to the file to process
disk
string|null
default:"null"
Laravel disk name for file storage (null uses default disk)
replace
bool
default:"false"
Whether to replace existing artifacts (true) or add to them (false)
return
static
Returns the builder instance for method chaining

files

Add multiple files to extract data from.
$builder->files([
    'path/to/document1.pdf',
    'path/to/document2.pdf'
])
paths
string[]
required
Array of file paths to process
replace
bool
default:"false"
Whether to replace existing artifacts (true) or add to them (false)
return
static
Returns the builder instance for method chaining

artifact

Add a custom artifact to extract data from.
$builder->artifact(new CustomArtifact($data))
artifact
Artifact
required
An Artifact instance to process
replace
bool
default:"false"
Whether to replace existing artifacts (true) or add to them (false)
return
static
Returns the builder instance for method chaining

artifacts

Add multiple custom artifacts.
$builder->artifacts([
    new CustomArtifact($data1),
    new CustomArtifact($data2)
])
artifacts
Artifact[]
required
Array of Artifact instances to process
replace
bool
default:"false"
Whether to replace existing artifacts (true) or add to them (false)
return
static
Returns the builder instance for method chaining

getArtifacts

Get the current artifacts array.
$artifacts = $builder->getArtifacts()
return
Artifact[]
Array of all added artifacts

Processing Options

chunkSize

Set the maximum chunk size in tokens for document splitting.
$builder->chunkSize(4000)
chunkSize
int|null
required
Maximum tokens per chunk, or null to use the default from configuration
return
static
Returns the builder instance for method chaining

contextOptions

Set context filtering and processing options.
$builder->contextOptions(ContextOptions::default())
filter
ContextOptions
required
ContextOptions instance defining how to process document context
return
static
Returns the builder instance for method chaining

Callback Methods

onMessage

Set a callback for completed messages during extraction.
$builder->onMessage(function(Message $message) {
    echo "Extraction message: " . $message->content;
})
onMessage
Closure(Message): void|null
required
Closure that receives each completed Message
return
static
Returns the builder instance for method chaining

onMessageProgress

Set a callback for streaming message progress.
$builder->onMessageProgress(function(Message $message) {
    echo $message->content;
})
onMessageProgress
Closure(Message): void|null
required
Closure that receives partial Messages during streaming
return
static
Returns the builder instance for method chaining

onTokenStats

Set a callback for token usage statistics.
$builder->onTokenStats(function(TokenStats $stats) {
    echo "Tokens: {$stats->total}";
})
onTokenStats
Closure(TokenStats): void|null
required
Closure that receives TokenStats with usage information
return
static
Returns the builder instance for method chaining

onDataProgress

Set a callback for extraction data progress.
$builder->onDataProgress(function(array $data) {
    echo "Extracted data: " . json_encode($data);
})
onDataProgress
Closure(array): void|null
required
Closure that receives extracted data arrays as they are processed
return
static
Returns the builder instance for method chaining

onActorTelemetry

Set a callback for actor telemetry data.
$builder->onActorTelemetry(function(ActorTelemetry $telemetry) {
    Log::info('Actor telemetry', $telemetry->toArray());
})
onActorTelemetry
Closure(ActorTelemetry): void|null
required
Closure that receives ActorTelemetry for each extraction actor
return
static
Returns the builder instance for method chaining

Tool Methods

tools

Register tools that the extraction model can call.
$builder->tools([
    'lookup' => fn(string $id) => database_lookup($id)
])
tools
mixed
required
Array of tools (closures or InvokableTool instances) with string keys as tool names
return
static
Returns the builder instance for method chaining

Execution Methods

stream

Execute the extraction with streaming (processes data as it’s extracted).
$results = $builder->stream()
return
Collection
Laravel Collection of extracted data items matching the schema

send

Execute the extraction without streaming (waits for complete results).
$results = $builder->send()
return
Collection
Laravel Collection of extracted data items matching the schema

Build docs developers (and LLMs) love