Artifacts

Artifacts are the fundamental input to Struktur’s extraction engine. They are JSON DTOs (Data Transfer Objects) that represent pre-parsed content with normalized text and media slices. Struktur does not parse PDFs, HTML, or other formats directly—it expects you to provide normalized artifacts.

What is an artifact?

An artifact is a TypeScript object that implements the Artifact interface:

export interface Artifact {
  id: string;
  type: ArtifactType;
  raw: () => Promise<Buffer>;
  contents: ArtifactContent[];
  metadata?: Record<string, unknown>;
  tokens?: number;
}

Each artifact contains:

id: Unique identifier for tracking and debugging
type: One of "text", "image", "pdf", or "file"
raw: Function that returns the original buffer (for caching/debugging)
contents: Array of content slices with text and/or media
metadata: Optional key-value metadata
tokens: Optional pre-computed token count

Artifact contents

The contents array contains ArtifactContent objects that represent slices of the artifact:

export type ArtifactContent = {
  page?: number;
  text?: string;
  media?: ArtifactImage[];
};

Each content slice can have:

page: Optional page number (useful for PDFs)
text: Text content for this slice
media: Array of embedded images with position/size data

Images are represented as:

export type ArtifactImage = {
  type: "image";
  url?: string;
  base64?: string;
  contents?: Buffer;
  text?: string;
  x?: number;
  y?: number;
  width?: number;
  height?: number;
};

Creating artifacts

Struktur provides several helper functions to create artifacts from different sources.

From JSON URLs

Load pre-serialized artifacts from a URL:

import { urlToArtifact } from "@mateffy/struktur";

const artifact = await urlToArtifact("https://example.com/artifact.json");

The JSON must conform to the serialized artifact schema (without the raw function and with images as URLs or base64).

From files

Convert files to artifacts using MIME type detection:

import { fileToArtifact } from "@mateffy/struktur";

const buffer = await Bun.file("document.txt").arrayBuffer();
const artifact = await fileToArtifact(Buffer.from(buffer), {
  mimeType: "text/plain"
});

For plain text files, Struktur automatically splits text into content blocks by paragraph breaks:

const splitTextIntoContents = (text: string): ArtifactContent[] => {
  const blocks = text
    .split(/\n\s*\n/g)
    .map((block) => block.trim())
    .filter((block) => block.length > 0);

  return blocks.map((block) => ({ text: block }));
};

Using providers

For custom file types (like PDFs), register an artifact provider:

import { fileToArtifact, type ArtifactProvider } from "@mateffy/struktur";

const pdfProvider: ArtifactProvider = async (buffer) => ({
  id: "pdf-1",
  type: "pdf",
  raw: async () => buffer,
  contents: [
    { page: 1, text: "Page 1 text..." },
    { page: 2, text: "Page 2 text..." }
  ]
});

const providers = {
  "application/pdf": pdfProvider
};

const artifact = await fileToArtifact(buffer, {
  mimeType: "application/pdf",
  providers
});

Providers are passed as plain objects for multi-tenant flexibility.

From text

Create a simple text artifact:

import { parseInputToArtifacts } from "@mateffy/struktur";

const artifacts = await parseInputToArtifacts({
  kind: "text",
  text: "Your content here",
  id: "custom-id" // optional
});

Common use cases

Single-page document

For simple text documents, create a single artifact with one content entry:

const artifact: Artifact = {
  id: "doc-1",
  type: "text",
  raw: async () => Buffer.from(text),
  contents: [{ text }]
};

Multi-page PDF

For PDFs, create one content entry per page:

const artifact: Artifact = {
  id: "pdf-1",
  type: "pdf",
  raw: async () => pdfBuffer,
  contents: pages.map((text, i) => ({
    page: i + 1,
    text
  }))
};

Document with images

Embed images in content slices where they appear:

const artifact: Artifact = {
  id: "doc-1",
  type: "file",
  raw: async () => buffer,
  contents: [
    { text: "Introduction..." },
    {
      text: "Figure 1 shows...",
      media: [{
        type: "image",
        url: "https://example.com/fig1.png"
      }]
    }
  ]
};

Validation

Struktur validates serialized artifacts against a JSON schema before hydration:

import { validateSerializedArtifacts } from "@mateffy/struktur";

try {
  const artifacts = validateSerializedArtifacts(jsonData);
  // artifacts is now typed as SerializedArtifact[]
} catch (error) {
  // SchemaValidationError with detailed errors
}

Design philosophy

Artifacts are designed to be:

Format-agnostic: The same structure works for text, PDFs, images, and custom formats
Serializable: Can be saved to JSON and loaded via urlToArtifact
Normalized: All content is pre-parsed into consistent text/media slices
Extensible: Use providers to add support for any format

This separation of concerns means Struktur focuses on extraction logic, not parsing—you bring normalized data, Struktur extracts structured results.

Get Started

Core Concepts

Guides

Examples

What is an artifact?

Artifact contents

Creating artifacts

From JSON URLs

From files

Using providers

From text

Common use cases

Validation

Design philosophy

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

​What is an artifact?

​Artifact contents

​Creating artifacts

​From JSON URLs

​From files

​Using providers

​From text

​Common use cases

​Validation

​Design philosophy

Build docs developers (and LLMs) love

What is an artifact?

Artifact contents

Creating artifacts

From JSON URLs

From files

Using providers

From text

Common use cases

Validation

Design philosophy