Skip to main content

Overview

Struktur exports a comprehensive set of TypeScript types for type-safe extraction workflows. All types are exported from the main package entry point.

Artifact types

Artifact

The core interface representing a pre-parsed document with text and optional media content.
interface Artifact {
  id: string;
  type: ArtifactType;
  raw: () => Promise<Buffer>;
  contents: ArtifactContent[];
  metadata?: Record<string, unknown>;
  tokens?: number;
}
id
string
required
Unique identifier for this artifact.
type
ArtifactType
required
The type of artifact: "text", "image", "pdf", or "file".
raw
() => Promise<Buffer>
required
Async function that returns the raw buffer of the original source.
contents
ArtifactContent[]
required
Array of content slices. Each slice can contain text and/or media for a specific page or section.
metadata
Record<string, unknown>
Optional metadata associated with the artifact.
tokens
number
Optional pre-calculated token count for this artifact.

ArtifactType

Union type defining the possible artifact types.
type ArtifactType = "text" | "image" | "pdf" | "file";

ArtifactContent

Represents a single content slice within an artifact.
type ArtifactContent = {
  page?: number;
  text?: string;
  media?: ArtifactImage[];
};
page
number
Optional page number for multi-page documents.
text
string
Text content for this slice.
media
ArtifactImage[]
Array of images associated with this content slice.

ArtifactImage

Represents an image within artifact content.
type ArtifactImage = {
  type: "image";
  url?: string;
  base64?: string;
  contents?: Buffer;
  text?: string;
  x?: number;
  y?: number;
  width?: number;
  height?: number;
};
type
'image'
required
Fixed value identifying this as an image.
url
string
URL to the image resource.
base64
string
Base64-encoded image data.
contents
Buffer
Raw image buffer.
text
string
Optional alt text or OCR text associated with the image.
x
number
X-coordinate for positioned images.
y
number
Y-coordinate for positioned images.
width
number
Image width in pixels or points.
height
number
Image height in pixels or points.

Extraction types

ExtractionOptions

Configuration object passed to the extract() function.
type ExtractionOptions<T> = {
  artifacts: Artifact[];
  schema: TypedJSONSchema<T> | AnyJSONSchema;
  strategy: ExtractionStrategy<T>;
  events?: ExtractionEvents;
};
artifacts
Artifact[]
required
Array of artifacts to extract from.
schema
TypedJSONSchema<T> | AnyJSONSchema
required
JSON Schema for validation and type inference.
strategy
ExtractionStrategy<T>
required
Strategy instance that defines the extraction workflow.
events
ExtractionEvents
Optional event handlers for progress and debugging.

ExtractionResult

The result returned by extract().
type ExtractionResult<T> = {
  data: T;
  usage: Usage;
  error?: Error;
};
data
T
required
The extracted data, validated against the schema. Will be null (cast to T) if extraction failed.
usage
Usage
required
Token usage statistics.
error
Error
Error object if extraction failed.

ExtractionStrategy

Interface that all strategy implementations must satisfy.
interface ExtractionStrategy<T> {
  name: string;
  run(options: ExtractionOptions<T>): Promise<ExtractionResult<T>>;
  getEstimatedSteps?: (artifacts: Artifact[]) => number;
}
name
string
required
Unique identifier for the strategy (e.g., “simple”, “parallel”).
run
(options: ExtractionOptions<T>) => Promise<ExtractionResult<T>>
required
Executes the extraction workflow.
getEstimatedSteps
(artifacts: Artifact[]) => number
Optional method that returns the estimated number of steps for progress tracking.

Usage

Token usage statistics.
type Usage = {
  inputTokens: number;
  outputTokens: number;
  totalTokens: number;
};
inputTokens
number
required
Total input tokens consumed.
outputTokens
number
required
Total output tokens generated.
totalTokens
number
required
Sum of input and output tokens.

Event types

ExtractionEvents

Event handlers for monitoring extraction progress.
type ExtractionEvents = {
  onStep?: (info: StepInfo) => void | Promise<void>;
  onMessage?: (info: MessageInfo) => void | Promise<void>;
  onProgress?: (info: ProgressInfo) => void | Promise<void>;
  onTokenUsage?: (info: TokenUsageInfo) => void | Promise<void>;
};
onStep
(info: StepInfo) => void | Promise<void>
Called at each major step.
onMessage
(info: MessageInfo) => void | Promise<void>
Called when LLM messages are exchanged.
onProgress
(info: ProgressInfo) => void | Promise<void>
Called during batch processing.
onTokenUsage
(info: TokenUsageInfo) => void | Promise<void>
Called after each LLM call with usage stats.

StepInfo

type StepInfo = {
  step: number;
  total?: number;
  label?: string;
};

MessageInfo

type MessageInfo = {
  role: "system" | "user" | "assistant" | "tool";
  content: unknown;
};

ProgressInfo

type ProgressInfo = {
  current: number;
  total: number;
  percent?: number;
};

TokenUsageInfo

type TokenUsageInfo = Usage & {
  model?: string;
};
Extends Usage with an optional model identifier.

Schema types

TypedJSONSchema

Ajv’s typed JSON Schema type for type inference.
import type { JSONSchemaType } from "ajv";

type TypedJSONSchema<T> = JSONSchemaType<T>;
When you use JSONSchemaType<T>, the extract() function can infer the type of result.data.

AnyJSONSchema

Untyped JSON Schema.
type AnyJSONSchema = Record<string, unknown>;
Use this when you don’t need compile-time type inference.

Usage example

import type {
  Artifact,
  ExtractionOptions,
  ExtractionResult,
  Usage,
} from "@mateffy/struktur";
import type { JSONSchemaType } from "ajv";

// Define your output type
type Invoice = {
  invoiceNumber: string;
  total: number;
  items: Array<{ description: string; amount: number }>;
};

// Create a typed schema
const schema: JSONSchemaType<Invoice> = {
  type: "object",
  properties: {
    invoiceNumber: { type: "string" },
    total: { type: "number" },
    items: {
      type: "array",
      items: {
        type: "object",
        properties: {
          description: { type: "string" },
          amount: { type: "number" },
        },
        required: ["description", "amount"],
      },
    },
  },
  required: ["invoiceNumber", "total", "items"],
  additionalProperties: false,
};

// result.data is now typed as Invoice
const result: ExtractionResult<Invoice> = await extract({
  artifacts,
  schema,
  strategy,
});

if (!result.error) {
  // TypeScript knows result.data is Invoice
  console.log(result.data.invoiceNumber);
  console.log(result.data.total);
}

See also

Build docs developers (and LLMs) love