Skip to main content
This guide will get you extracting structured data from artifacts in minutes.

Prerequisites

  • TypeScript 5.x or later
  • Node.js, Bun, or another JavaScript runtime
  • An API key for OpenAI, Anthropic, Google AI, or OpenRouter

Basic extraction

Here’s a complete example that extracts a title from an artifact:
import { extract, simple } from "@mateffy/struktur";
import type { JSONSchemaType } from "ajv";
import { google } from "@ai-sdk/google";

// Define your output type
type Output = { title: string };

// Create a JSON schema for validation
const schema: JSONSchemaType<Output> = {
  type: "object",
  properties: { title: { type: "string" } },
  required: ["title"],
  additionalProperties: false,
};

// Create an artifact with some text
const artifacts = [
  {
    id: "doc-1",
    type: "text" as const,
    raw: async () => Buffer.from(""),
    contents: [{ text: "Document Title: Getting Started with Struktur" }],
  },
];

// Extract structured data
const result = await extract({
  artifacts,
  schema,
  strategy: simple({ model: google("gemini-2.0-flash-exp") }),
});

console.log(result.data.title);
// Output: "Getting Started with Struktur"

Understanding the components

1

Define your output type

Create a TypeScript type for the data you want to extract:
type Output = { title: string };
2

Create a JSON schema

Use Ajv’s JSONSchemaType for type-safe validation:
const schema: JSONSchemaType<Output> = {
  type: "object",
  properties: { title: { type: "string" } },
  required: ["title"],
  additionalProperties: false,
};
3

Prepare your artifacts

Artifacts are normalized document representations with text and optional media:
const artifacts = [{
  id: "doc-1",
  type: "text",
  raw: async () => Buffer.from(""),
  contents: [{ text: "Your document text" }],
}];
4

Choose a strategy

Pick an extraction strategy based on your document size:
strategy: simple({ model: google("gemini-2.0-flash-exp") })
5

Extract and validate

Call extract() to get validated, type-safe results:
const result = await extract({ artifacts, schema, strategy });
console.log(result.data); // Fully typed!

Extracting complex data

Extract nested objects and arrays:
import { extract, simple } from "@mateffy/struktur";
import type { JSONSchemaType } from "ajv";
import { anthropic } from "@ai-sdk/anthropic";

type Product = {
  name: string;
  price: number;
  features: string[];
};

const schema: JSONSchemaType<Product> = {
  type: "object",
  properties: {
    name: { type: "string" },
    price: { type: "number" },
    features: { type: "array", items: { type: "string" } },
  },
  required: ["name", "price", "features"],
  additionalProperties: false,
};

const artifacts = [{
  id: "product",
  type: "text",
  raw: async () => Buffer.from(""),
  contents: [{
    text: `
      Laptop Pro 15
      Price: $1299
      Features: 16GB RAM, 512GB SSD, 15" Retina Display
    `
  }],
}];

const result = await extract({
  artifacts,
  schema,
  strategy: simple({ model: anthropic("claude-3-5-haiku-20241022") }),
});

console.log(result.data);
// {
//   name: "Laptop Pro 15",
//   price: 1299,
//   features: ["16GB RAM", "512GB SSD", "15\" Retina Display"]
// }

Processing larger documents

For documents that exceed context limits, use the parallel strategy:
import { extract, parallel } from "@mateffy/struktur";
import { google } from "@ai-sdk/google";

const result = await extract({
  artifacts, // Can be multiple artifacts or large documents
  schema,
  strategy: parallel({
    model: google("gemini-2.0-flash-exp"),
    mergeModel: google("gemini-2.0-flash-exp"),
    chunkSize: 10_000,  // Token budget per chunk
    concurrency: 4,      // Process 4 chunks at once
  }),
});

Loading artifacts from files

Use urlToArtifact or fileToArtifact to load pre-serialized artifacts:
import { extract, simple, urlToArtifact } from "@mateffy/struktur";

// Load from a URL
const artifact = await urlToArtifact("https://example.com/artifact.json");

// Or from a file
const buffer = await Bun.file("artifact.json").arrayBuffer();
const artifact = await fileToArtifact(Buffer.from(buffer), {
  mimeType: "application/json",
});

const result = await extract({
  artifacts: [artifact],
  schema,
  strategy: simple({ model }),
});
Struktur expects pre-parsed artifacts. It doesn’t parse PDFs or HTML directly. You’ll need to convert documents to the artifact format using custom providers.

Tracking progress

Use event handlers to monitor extraction progress:
const result = await extract({
  artifacts,
  schema,
  strategy: parallel({ model, mergeModel: model, chunkSize: 10_000 }),
  events: {
    onStep: ({ step, total, label }) => {
      console.log(`Step ${step}/${total}: ${label}`);
    },
    onProgress: ({ current, total, percent }) => {
      console.log(`Progress: ${percent}%`);
    },
    onTokenUsage: ({ inputTokens, outputTokens, totalTokens }) => {
      console.log(`Tokens: ${totalTokens}`);
    },
    onMessage: ({ role, content }) => {
      console.log(`[${role}]`, content);
    },
  },
});

Next steps

Core concepts

Learn about extraction strategies and when to use each

API reference

Explore the complete API documentation

Examples

See real-world examples and patterns

CLI guide

Use Struktur from the command line

Build docs developers (and LLMs) love