Quickstart

This guide will get you extracting structured data from artifacts in minutes.

Prerequisites

TypeScript 5.x or later
Node.js, Bun, or another JavaScript runtime
An API key for OpenAI, Anthropic, Google AI, or OpenRouter

Basic extraction

Here’s a complete example that extracts a title from an artifact:

import { extract, simple } from "@mateffy/struktur";
import type { JSONSchemaType } from "ajv";
import { google } from "@ai-sdk/google";

// Define your output type
type Output = { title: string };

// Create a JSON schema for validation
const schema: JSONSchemaType<Output> = {
  type: "object",
  properties: { title: { type: "string" } },
  required: ["title"],
  additionalProperties: false,
};

// Create an artifact with some text
const artifacts = [
  {
    id: "doc-1",
    type: "text" as const,
    raw: async () => Buffer.from(""),
    contents: [{ text: "Document Title: Getting Started with Struktur" }],
  },
];

// Extract structured data
const result = await extract({
  artifacts,
  schema,
  strategy: simple({ model: google("gemini-2.0-flash-exp") }),
});

console.log(result.data.title);
// Output: "Getting Started with Struktur"

Understanding the components

Define your output type

Create a TypeScript type for the data you want to extract:

type Output = { title: string };

Create a JSON schema

Use Ajv’s JSONSchemaType for type-safe validation:

const schema: JSONSchemaType<Output> = {
  type: "object",
  properties: { title: { type: "string" } },
  required: ["title"],
  additionalProperties: false,
};

Prepare your artifacts

Artifacts are normalized document representations with text and optional media:

const artifacts = [{
  id: "doc-1",
  type: "text",
  raw: async () => Buffer.from(""),
  contents: [{ text: "Your document text" }],
}];

Choose a strategy

Pick an extraction strategy based on your document size:

strategy: simple({ model: google("gemini-2.0-flash-exp") })

Extract and validate

Call extract() to get validated, type-safe results:

const result = await extract({ artifacts, schema, strategy });
console.log(result.data); // Fully typed!

Extracting complex data

Extract nested objects and arrays:

import { extract, simple } from "@mateffy/struktur";
import type { JSONSchemaType } from "ajv";
import { anthropic } from "@ai-sdk/anthropic";

type Product = {
  name: string;
  price: number;
  features: string[];
};

const schema: JSONSchemaType<Product> = {
  type: "object",
  properties: {
    name: { type: "string" },
    price: { type: "number" },
    features: { type: "array", items: { type: "string" } },
  },
  required: ["name", "price", "features"],
  additionalProperties: false,
};

const artifacts = [{
  id: "product",
  type: "text",
  raw: async () => Buffer.from(""),
  contents: [{
    text: `
      Laptop Pro 15
      Price: $1299
      Features: 16GB RAM, 512GB SSD, 15" Retina Display
    `
  }],
}];

const result = await extract({
  artifacts,
  schema,
  strategy: simple({ model: anthropic("claude-3-5-haiku-20241022") }),
});

console.log(result.data);
// {
//   name: "Laptop Pro 15",
//   price: 1299,
//   features: ["16GB RAM", "512GB SSD", "15\" Retina Display"]
// }

Processing larger documents

For documents that exceed context limits, use the parallel strategy:

import { extract, parallel } from "@mateffy/struktur";
import { google } from "@ai-sdk/google";

const result = await extract({
  artifacts, // Can be multiple artifacts or large documents
  schema,
  strategy: parallel({
    model: google("gemini-2.0-flash-exp"),
    mergeModel: google("gemini-2.0-flash-exp"),
    chunkSize: 10_000,  // Token budget per chunk
    concurrency: 4,      // Process 4 chunks at once
  }),
});

Loading artifacts from files

Use urlToArtifact or fileToArtifact to load pre-serialized artifacts:

import { extract, simple, urlToArtifact } from "@mateffy/struktur";

// Load from a URL
const artifact = await urlToArtifact("https://example.com/artifact.json");

// Or from a file
const buffer = await Bun.file("artifact.json").arrayBuffer();
const artifact = await fileToArtifact(Buffer.from(buffer), {
  mimeType: "application/json",
});

const result = await extract({
  artifacts: [artifact],
  schema,
  strategy: simple({ model }),
});

Struktur expects pre-parsed artifacts. It doesn’t parse PDFs or HTML directly. You’ll need to convert documents to the artifact format using custom providers.

Tracking progress

Use event handlers to monitor extraction progress:

const result = await extract({
  artifacts,
  schema,
  strategy: parallel({ model, mergeModel: model, chunkSize: 10_000 }),
  events: {
    onStep: ({ step, total, label }) => {
      console.log(`Step ${step}/${total}: ${label}`);
    },
    onProgress: ({ current, total, percent }) => {
      console.log(`Progress: ${percent}%`);
    },
    onTokenUsage: ({ inputTokens, outputTokens, totalTokens }) => {
      console.log(`Tokens: ${totalTokens}`);
    },
    onMessage: ({ role, content }) => {
      console.log(`[${role}]`, content);
    },
  },
});

Next steps

Core concepts

Learn about extraction strategies and when to use each

API reference

Explore the complete API documentation

Examples

See real-world examples and patterns

CLI guide

Use Struktur from the command line

Get Started

Core Concepts

Guides

Examples

Prerequisites

Basic extraction

Understanding the components

Extracting complex data

Processing larger documents

Loading artifacts from files

Tracking progress

Next steps

Core concepts

API reference

Examples

CLI guide

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

​Prerequisites

​Basic extraction

​Understanding the components

​Extracting complex data

​Processing larger documents

​Loading artifacts from files

​Tracking progress

​Next steps

Core concepts

API reference

Examples

CLI guide

Build docs developers (and LLMs) love

Prerequisites

Basic extraction

Understanding the components

Extracting complex data

Processing larger documents

Loading artifacts from files

Tracking progress

Next steps