Introduction
Struktur is a structured data extraction engine that turns pre-parsed artifacts into validated JSON using the Vercel AI SDK. It chunks content by token budgets, runs strategy-driven workflows, validates with Ajv, and merges or dedupes when needed.Key features
- Strategy-driven extraction - Choose from simple, parallel, sequential, auto-merge, or double-pass strategies
- Token-aware batching - Automatically chunks content by token budgets with optional image limits
- Schema-first validation - Validates results with Ajv and automatically retries on failure
- Smart merging and deduplication - Schema-aware rules with CRC32 hashing for efficient data processing
- Fully typed results - Uses Ajv
JSONSchemaType<T>for type-safe extraction results
Struktur operates on pre-parsed artifact DTOs. It does not parse PDFs or HTML directly - it expects normalized JSON inputs.
Get started
Installation
Install Struktur with your preferred package manager
Quick start
Get up and running with a working extraction example
Extraction strategies
Learn about the different extraction strategies available
API reference
Explore the complete API documentation
How it works
Struktur follows a clear extraction pipeline:- Input artifacts - Provide pre-parsed JSON artifacts with text and media content
- Select strategy - Choose an extraction strategy based on your content size and complexity
- Define schema - Specify the output structure using JSON Schema
- Extract data - Struktur chunks content, runs LLM extraction, validates results, and merges outputs
- Get typed results - Receive validated, type-safe extraction results