Overview
Theextract() function is the primary API for Struktur. It takes artifacts, a JSON schema, and an extraction strategy, then returns validated, structured data.
Function signature
export const extract = async <T>(
options: ExtractionOptions<T>,
): Promise<ExtractionResult<T>>
Parameters
Configuration object for the extraction process.
Show properties
Show properties
Array of artifacts to extract data from. Each artifact represents a pre-parsed document with text and optional media content.
JSON Schema definition for the output type. Use
JSONSchemaType<T> from Ajv for type-safe results.The extraction strategy to use (e.g.,
simple(), parallel(), sequential()). Strategies define how artifacts are chunked, processed, and merged.Optional event handlers for monitoring extraction progress.
Show properties
Show properties
Called after each LLM call with token usage information.
Returns
A promise that resolves to the extraction result.
Show properties
Show properties
The extracted data, validated against the provided schema. If extraction fails, this will be
null (cast to T).Present if the extraction failed. When an error occurs,
data will be null and usage will show zero tokens.Basic example
import { extract, simple } from "@mateffy/struktur";
import type { JSONSchemaType } from "ajv";
import { google } from "@ai-sdk/google";
type Output = { title: string; description: string };
const schema: JSONSchemaType<Output> = {
type: "object",
properties: {
title: { type: "string" },
description: { type: "string" },
},
required: ["title", "description"],
additionalProperties: false,
};
const result = await extract({
artifacts: [myArtifact],
schema,
strategy: simple({ model: google("gemini-2.0-flash-exp") }),
});
if (result.error) {
console.error("Extraction failed:", result.error);
} else {
console.log(result.data.title);
console.log("Used", result.usage.totalTokens, "tokens");
}
Parallel extraction example
import { extract, parallel } from "@mateffy/struktur";
import { google } from "@ai-sdk/google";
const result = await extract({
artifacts: multiPageDocument,
schema,
strategy: parallel({
model: google("gemini-2.0-flash-exp"),
mergeModel: google("gemini-2.0-flash-exp"),
chunkSize: 10_000,
concurrency: 4,
}),
});
With event handlers
import { extract, simple } from "@mateffy/struktur";
const result = await extract({
artifacts,
schema,
strategy: simple({ model }),
events: {
onStep: ({ step, total, label }) => {
console.log(`Step ${step}/${total}: ${label}`);
},
onTokenUsage: ({ inputTokens, outputTokens, model }) => {
console.log(`${model}: ${inputTokens} in, ${outputTokens} out`);
},
},
});
Error handling
Theextract() function catches all errors and returns them in the result object rather than throwing:
const result = await extract({ artifacts, schema, strategy });
if (result.error) {
// Extraction failed - handle the error
console.error("Failed:", result.error.message);
// result.data will be null (cast to T)
// result.usage will show { inputTokens: 0, outputTokens: 0, totalTokens: 0 }
} else {
// Success - use result.data
console.log(result.data);
}
See also
- Types - Core TypeScript types
- Strategies - Available extraction strategies
- Artifacts - Working with artifacts