ExtractionLLMBuilder

The ExtractionLLMBuilder provides a fluent interface for extracting structured data from documents using different extraction strategies. It’s designed for processing large documents and extracting specific information according to a schema.

Configuration Methods

model

Set the LLM model to use for extraction.

$builder->model('gpt-4o')

model

string|LLM

required

The model identifier string (e.g., ‘gpt-4o’, ‘claude-3-5-sonnet’) or LLM instance

return

static

Returns the builder instance for method chaining

schema

Set the JSON schema for the data to extract.

$builder->schema([
    'type' => 'object',
    'properties' => [
        'invoice_number' => ['type' => 'string'],
        'total_amount' => ['type' => 'number'],
        'items' => [
            'type' => 'array',
            'items' => [
                'type' => 'object',
                'properties' => [
                    'name' => ['type' => 'string'],
                    'price' => ['type' => 'number']
                ]
            ]
        ]
    ]
])

schema

array

required

JSON schema array defining the structure of data to extract

return

static

Returns the builder instance for method chaining

strategy

Set the extraction strategy to use.

$builder->strategy('parallel-auto-merge')

strategy

string|null

required

Strategy name: ‘simple’, ‘sequential’, ‘sequential-auto-merge’, ‘parallel’, ‘parallel-auto-merge’, ‘double-pass’, ‘double-pass-auto-merge’, or a registered custom strategy

return

static

Returns the builder instance for method chaining

instructions

Set custom output instructions for the extraction.

$builder->instructions('Extract all financial data with high precision')

instructions

string|null

required

Additional instructions to guide the extraction process

return

static

Returns the builder instance for method chaining

Artifact Methods

file

Add a file to extract data from.

$builder->file('path/to/document.pdf')
$builder->file('path/to/invoice.pdf', disk: 's3', replace: false)

path

string

required

Path to the file to process

disk

string|null

default:"null"

Laravel disk name for file storage (null uses default disk)

replace

bool

default:"false"

Whether to replace existing artifacts (true) or add to them (false)

return

static

Returns the builder instance for method chaining

files

Add multiple files to extract data from.

$builder->files([
    'path/to/document1.pdf',
    'path/to/document2.pdf'
])

paths

string[]

required

Array of file paths to process

replace

bool

default:"false"

Whether to replace existing artifacts (true) or add to them (false)

return

static

Returns the builder instance for method chaining

artifact

Add a custom artifact to extract data from.

$builder->artifact(new CustomArtifact($data))

artifact

Artifact

required

An Artifact instance to process

replace

bool

default:"false"

Whether to replace existing artifacts (true) or add to them (false)

return

static

Returns the builder instance for method chaining

artifacts

Add multiple custom artifacts.

$builder->artifacts([
    new CustomArtifact($data1),
    new CustomArtifact($data2)
])

artifacts

Artifact[]

required

Array of Artifact instances to process

replace

bool

default:"false"

Whether to replace existing artifacts (true) or add to them (false)

return

static

Returns the builder instance for method chaining

getArtifacts

Get the current artifacts array.

$artifacts = $builder->getArtifacts()

return

Artifact[]

Array of all added artifacts

Processing Options

chunkSize

Set the maximum chunk size in tokens for document splitting.

$builder->chunkSize(4000)

chunkSize

int|null

required

Maximum tokens per chunk, or null to use the default from configuration

return

static

Returns the builder instance for method chaining

contextOptions

Set context filtering and processing options.

$builder->contextOptions(ContextOptions::default())

filter

ContextOptions

required

ContextOptions instance defining how to process document context

return

static

Returns the builder instance for method chaining

Callback Methods

onMessage

Set a callback for completed messages during extraction.

$builder->onMessage(function(Message $message) {
    echo "Extraction message: " . $message->content;
})

onMessage

Closure(Message): void|null

required

Closure that receives each completed Message

return

static

Returns the builder instance for method chaining

onMessageProgress

Set a callback for streaming message progress.

$builder->onMessageProgress(function(Message $message) {
    echo $message->content;
})

onMessageProgress

Closure(Message): void|null

required

Closure that receives partial Messages during streaming

return

static

Returns the builder instance for method chaining

onTokenStats

Set a callback for token usage statistics.

$builder->onTokenStats(function(TokenStats $stats) {
    echo "Tokens: {$stats->total}";
})

onTokenStats

Closure(TokenStats): void|null

required

Closure that receives TokenStats with usage information

return

static

Returns the builder instance for method chaining

onDataProgress

Set a callback for extraction data progress.

$builder->onDataProgress(function(array $data) {
    echo "Extracted data: " . json_encode($data);
})

onDataProgress

Closure(array): void|null

required

Closure that receives extracted data arrays as they are processed

return

static

Returns the builder instance for method chaining

onActorTelemetry

Set a callback for actor telemetry data.

$builder->onActorTelemetry(function(ActorTelemetry $telemetry) {
    Log::info('Actor telemetry', $telemetry->toArray());
})

onActorTelemetry

Closure(ActorTelemetry): void|null

required

Closure that receives ActorTelemetry for each extraction actor

return

static

Returns the builder instance for method chaining

Tool Methods

tools

$builder->tools([
    'lookup' => fn(string $id) => database_lookup($id)
])

tools

mixed

required

Array of tools (closures or InvokableTool instances) with string keys as tool names

return

static

Returns the builder instance for method chaining

Execution Methods

stream

Execute the extraction with streaming (processes data as it’s extracted).

$results = $builder->stream()

return

Collection

Laravel Collection of extracted data items matching the schema

send

Execute the extraction without streaming (waits for complete results).

$results = $builder->send()

return

Collection

Laravel Collection of extracted data items matching the schema

Core API

Messages

Tools

Models & Providers

ExtractionLLMBuilder

Configuration Methods

model

schema

strategy

instructions

Artifact Methods

file

files

artifact

artifacts

getArtifacts

Processing Options

chunkSize

contextOptions

Callback Methods

onMessage

onMessageProgress

onTokenStats

onDataProgress

onActorTelemetry

Tool Methods

tools

Execution Methods

stream

send

Build docs developers (and LLMs) love

Core API

Messages

Tools

Models & Providers

​Configuration Methods

​model

​schema

​strategy

​instructions

​Artifact Methods

​file

​files

​artifact

​artifacts

​getArtifacts

​Processing Options

​chunkSize

​contextOptions

​Callback Methods

​onMessage

​onMessageProgress

​onTokenStats

​onDataProgress

​onActorTelemetry

​Tool Methods

​tools

​Execution Methods

​stream

​send

Build docs developers (and LLMs) love

Configuration Methods

model

schema

strategy

instructions

Artifact Methods

file

files

artifact

artifacts

getArtifacts

Processing Options

chunkSize

contextOptions

Callback Methods

onMessage

onMessageProgress

onTokenStats

onDataProgress

onActorTelemetry

Tool Methods

tools

Execution Methods

stream

send