This guide will get you extracting structured data from artifacts in minutes.
Prerequisites
TypeScript 5.x or later
Node.js, Bun, or another JavaScript runtime
An API key for OpenAI, Anthropic, Google AI, or OpenRouter
Here’s a complete example that extracts a title from an artifact:
import { extract , simple } from "@mateffy/struktur" ;
import type { JSONSchemaType } from "ajv" ;
import { google } from "@ai-sdk/google" ;
// Define your output type
type Output = { title : string };
// Create a JSON schema for validation
const schema : JSONSchemaType < Output > = {
type: "object" ,
properties: { title: { type: "string" } },
required: [ "title" ],
additionalProperties: false ,
};
// Create an artifact with some text
const artifacts = [
{
id: "doc-1" ,
type: "text" as const ,
raw : async () => Buffer . from ( "" ),
contents: [{ text: "Document Title: Getting Started with Struktur" }],
},
];
// Extract structured data
const result = await extract ({
artifacts ,
schema ,
strategy: simple ({ model: google ( "gemini-2.0-flash-exp" ) }),
});
console . log ( result . data . title );
// Output: "Getting Started with Struktur"
Understanding the components
Define your output type
Create a TypeScript type for the data you want to extract: type Output = { title : string };
Create a JSON schema
Use Ajv’s JSONSchemaType for type-safe validation: const schema : JSONSchemaType < Output > = {
type: "object" ,
properties: { title: { type: "string" } },
required: [ "title" ],
additionalProperties: false ,
};
Prepare your artifacts
Artifacts are normalized document representations with text and optional media: const artifacts = [{
id: "doc-1" ,
type: "text" ,
raw : async () => Buffer . from ( "" ),
contents: [{ text: "Your document text" }],
}];
Choose a strategy
Pick an extraction strategy based on your document size: strategy : simple ({ model: google ( "gemini-2.0-flash-exp" ) })
Extract and validate
Call extract() to get validated, type-safe results: const result = await extract ({ artifacts , schema , strategy });
console . log ( result . data ); // Fully typed!
Extract nested objects and arrays:
import { extract , simple } from "@mateffy/struktur" ;
import type { JSONSchemaType } from "ajv" ;
import { anthropic } from "@ai-sdk/anthropic" ;
type Product = {
name : string ;
price : number ;
features : string [];
};
const schema : JSONSchemaType < Product > = {
type: "object" ,
properties: {
name: { type: "string" },
price: { type: "number" },
features: { type: "array" , items: { type: "string" } },
},
required: [ "name" , "price" , "features" ],
additionalProperties: false ,
};
const artifacts = [{
id: "product" ,
type: "text" ,
raw : async () => Buffer . from ( "" ),
contents: [{
text: `
Laptop Pro 15
Price: $1299
Features: 16GB RAM, 512GB SSD, 15" Retina Display
`
}],
}];
const result = await extract ({
artifacts ,
schema ,
strategy: simple ({ model: anthropic ( "claude-3-5-haiku-20241022" ) }),
});
console . log ( result . data );
// {
// name: "Laptop Pro 15",
// price: 1299,
// features: ["16GB RAM", "512GB SSD", "15\" Retina Display"]
// }
Processing larger documents
For documents that exceed context limits, use the parallel strategy:
import { extract , parallel } from "@mateffy/struktur" ;
import { google } from "@ai-sdk/google" ;
const result = await extract ({
artifacts , // Can be multiple artifacts or large documents
schema ,
strategy: parallel ({
model: google ( "gemini-2.0-flash-exp" ),
mergeModel: google ( "gemini-2.0-flash-exp" ),
chunkSize: 10_000 , // Token budget per chunk
concurrency: 4 , // Process 4 chunks at once
}),
});
Loading artifacts from files
Use urlToArtifact or fileToArtifact to load pre-serialized artifacts:
import { extract , simple , urlToArtifact } from "@mateffy/struktur" ;
// Load from a URL
const artifact = await urlToArtifact ( "https://example.com/artifact.json" );
// Or from a file
const buffer = await Bun . file ( "artifact.json" ). arrayBuffer ();
const artifact = await fileToArtifact ( Buffer . from ( buffer ), {
mimeType: "application/json" ,
});
const result = await extract ({
artifacts: [ artifact ],
schema ,
strategy: simple ({ model }),
});
Struktur expects pre-parsed artifacts. It doesn’t parse PDFs or HTML directly. You’ll need to convert documents to the artifact format using custom providers.
Tracking progress
Use event handlers to monitor extraction progress:
const result = await extract ({
artifacts ,
schema ,
strategy: parallel ({ model , mergeModel: model , chunkSize: 10_000 }),
events: {
onStep : ({ step , total , label }) => {
console . log ( `Step ${ step } / ${ total } : ${ label } ` );
},
onProgress : ({ current , total , percent }) => {
console . log ( `Progress: ${ percent } %` );
},
onTokenUsage : ({ inputTokens , outputTokens , totalTokens }) => {
console . log ( `Tokens: ${ totalTokens } ` );
},
onMessage : ({ role , content }) => {
console . log ( `[ ${ role } ]` , content );
},
},
});
Next steps
Core concepts Learn about extraction strategies and when to use each
API reference Explore the complete API documentation
Examples See real-world examples and patterns
CLI guide Use Struktur from the command line