Skip to main content
The simple strategy is the most straightforward way to extract data from artifacts. It sends all content in a single request to the LLM and returns the validated result.

Basic example

This example extracts a title from a single-page document:
import { extract, simple, type Artifact } from "@mateffy/struktur";
import type { JSONSchemaType } from "ajv";
import { google } from "@ai-sdk/google";

type Output = {
  title: string;
};

const schema: JSONSchemaType<Output> = {
  type: "object",
  properties: {
    title: { type: "string" },
  },
  required: ["title"],
  additionalProperties: false,
};

const artifacts: Artifact[] = [
  {
    id: "doc-1",
    type: "pdf",
    raw: async () => Buffer.from(""),
    contents: [{ page: 1, text: "Title: Example Document" }],
  },
];

const result = await extract({
  artifacts,
  schema,
  strategy: simple({
    model: google("gemini-2.0-flash-exp"),
  }),
});

console.log(result.data.title); // "Example Document"

When to use simple strategy

The simple strategy is best for:
  • Small documents that fit within the model’s context window
  • Single-page content like web pages or short PDFs
  • Quick prototypes where you want minimal configuration
  • Low latency requirements (single request)
The simple strategy doesn’t perform any chunking. If your content exceeds the model’s context limit, use parallel or sequential instead.

Extracting structured data

Extract multiple fields with nested objects:
import { extract, simple } from "@mateffy/struktur";
import type { JSONSchemaType } from "ajv";
import { anthropic } from "@ai-sdk/anthropic";

type Product = {
  name: string;
  price: number;
  specs: {
    weight?: number;
    dimensions?: string;
  };
};

const schema: JSONSchemaType<Product> = {
  type: "object",
  properties: {
    name: { type: "string" },
    price: { type: "number" },
    specs: {
      type: "object",
      properties: {
        weight: { type: "number", nullable: true },
        dimensions: { type: "string", nullable: true },
      },
      required: [],
      additionalProperties: false,
    },
  },
  required: ["name", "price", "specs"],
  additionalProperties: false,
};

const artifacts = [{
  id: "product",
  type: "text",
  raw: async () => Buffer.from(""),
  contents: [{
    text: "Laptop Pro 15. Price: $1299. Weight: 4.2 lbs. Size: 14 x 9.8 x 0.6 inches"
  }],
}];

const result = await extract({
  artifacts,
  schema,
  strategy: simple({
    model: anthropic("claude-3-5-sonnet-20241022"),
  }),
});

console.log(result.data);
// {
//   name: "Laptop Pro 15",
//   price: 1299,
//   specs: { weight: 4.2, dimensions: "14 x 9.8 x 0.6 inches" }
// }

Custom output instructions

Add additional instructions to guide extraction:
const result = await extract({
  artifacts,
  schema,
  strategy: simple({
    model: google("gemini-2.0-flash-exp"),
    outputInstructions: "Extract prices in USD. Round to 2 decimal places.",
  }),
});

Handling validation errors

The simple strategy validates results with Ajv and retries on failure:
import { extract, simple } from "@mateffy/struktur";

try {
  const result = await extract({
    artifacts,
    schema,
    strategy: simple({ model }),
    events: {
      onMessage: ({ role, content }) => {
        console.log(`[${role}]`, content);
      },
    },
  });
  
  console.log("Extracted:", result.data);
} catch (error) {
  if (error.name === "SchemaValidationError") {
    console.error("Validation failed:", error.errors);
  } else {
    console.error("Extraction failed:", error);
  }
}
Use the onMessage event to see validation retry attempts and understand why extraction might be failing.

Next steps

Parallel strategy

Process large documents with concurrent chunking

Sequential strategy

Build context incrementally for long documents

Build docs developers (and LLMs) love