Skip to main content
The Struktur CLI provides a complete interface for running extractions, managing provider tokens, and querying available models without writing code.

Installation

The CLI is included with the Struktur package:
bun install @mateffy/struktur
Run directly with bun:
bun struktur --help
Or use the global binary if installed:
struktur --help

Commands

Strukt provides four main commands:
  • extract-file: Extract structured data (default command)
  • verify: Validate artifact JSON format
  • auth: Manage provider API tokens
  • models: List available models

extract-file command

The default command for running extractions.

Basic usage

struktur extract-file \
  --input document.pdf \
  --schema schema.json \
  --model openai/gpt-4o-mini

Input options

Strukt accepts multiple input formats:
# Extract from a file
struktur extract-file \
  --input document.pdf \
  --schema schema.json

Schema options

# Schema from file
struktur extract-file \
  --input document.pdf \
  --schema schema.json

Model selection

# Explicit model
struktur extract-file \
  --input document.pdf \
  --schema schema.json \
  --model anthropic/claude-3-5-haiku-20241022

# Use configured default (if set)
struktur extract-file \
  --input document.pdf \
  --schema schema.json

# Auto-select cheapest from configured providers
# (no --model flag, no configured default)
struktur extract-file \
  --input document.pdf \
  --schema schema.json
Model format: provider/model-name Examples:
  • openai/gpt-4o-mini
  • anthropic/claude-3-5-haiku-20241022
  • google/gemini-1.5-flash
  • openrouter/anthropic/claude-3.5-sonnet
  • opencode/gpt-5-nano

Strategy selection

# Simple strategy (default)
struktur extract-file \
  --input document.pdf \
  --schema schema.json

# Parallel strategy
struktur extract-file \
  --input document.pdf \
  --schema schema.json \
  --strategy parallel \
  --chunk-size 10000

# Sequential strategy
struktur extract-file \
  --input document.pdf \
  --schema schema.json \
  --strategy sequential \
  --chunk-size 8000

# Auto-merge strategies
struktur extract-file \
  --input document.pdf \
  --schema schema.json \
  --strategy parallelAutoMerge \
  --chunk-size 10000

# Double-pass strategies
struktur extract-file \
  --input document.pdf \
  --schema schema.json \
  --strategy doublePass \
  --chunk-size 10000
Available strategies:
  • simple: Single-pass extraction (default)
  • parallel: Concurrent batches with LLM merge
  • sequential: Sequential batches with context
  • parallelAutoMerge: Parallel with schema-aware merge
  • sequentialAutoMerge: Sequential with schema-aware merge
  • doublePass: Parallel then sequential refinement
  • doublePassAutoMerge: Auto-merge then sequential refinement

Output options

# Output to stdout (default)
struktur extract-file \
  --input document.pdf \
  --schema schema.json

# Output to file
struktur extract-file \
  --input document.pdf \
  --schema schema.json \
  --output result.json

# Pipe to another command
struktur extract-file \
  --input document.pdf \
  --schema schema.json | jq '.title'

Chunk size configuration

For strategies that support chunking:
struktur extract-file \
  --input large-document.pdf \
  --schema schema.json \
  --strategy parallel \
  --chunk-size 15000
Default chunk size: 10,000 tokens Adjust based on:
  • Model context window size
  • Document complexity
  • Cost vs. accuracy tradeoffs

Complete example

struktur extract-file \
  --input research-paper.pdf \
  --schema paper-schema.json \
  --model anthropic/claude-3-5-haiku-20241022 \
  --strategy parallel \
  --chunk-size 12000 \
  --output extracted-data.json

auth command

Manage API tokens for AI providers.

auth set

Store a provider token:
1

Set token with command-line argument

struktur auth set --provider openai --token sk-...
2

Or read from stdin

echo "sk-..." | struktur auth set --provider openai --token-stdin
3

Set as default provider

struktur auth set --provider openai --token sk-... --default
This automatically configures the cheapest model from that provider as your default.
Storage options:
# Auto-detect storage (default)
struktur auth set --provider openai --token sk-...

# Force keychain storage (macOS only)
struktur auth set --provider openai --token sk-... --storage keychain

# Force file storage
struktur auth set --provider openai --token sk-... --storage file
Supported providers:
  • openai
  • anthropic
  • google
  • opencode
  • openrouter

auth default

Set or update your default model:
# Use cheapest model from a provider
struktur auth default openai

# Use a specific model
struktur auth default --model anthropic/claude-3-5-haiku-20241022
Output:
{
  "defaultModel": "openai/gpt-4o-mini"
}

auth get

Retrieve a stored token:
# Masked output (default)
struktur auth get --provider openai
# Output: sk-ab...xy12

# Raw token
struktur auth get --provider openai --raw
# Output: sk-abcdef1234567890...

auth list

List all configured providers:
struktur auth list
Output:
{
  "providers": [
    { "provider": "openai", "storage": "keychain" },
    { "provider": "anthropic", "storage": "file" },
    { "provider": "google", "storage": "keychain" }
  ]
}

auth delete

Remove a provider token:
struktur auth delete --provider openai
Output:
{
  "provider": "openai",
  "deleted": true
}

models command

Query available models from providers.

List all providers

struktur models --all
Output:
{
  "providers": [
    {
      "provider": "openai",
      "ok": true,
      "models": [
        "gpt-4o",
        "gpt-4o-mini",
        "gpt-4-turbo",
        "gpt-3.5-turbo"
      ]
    },
    {
      "provider": "anthropic",
      "ok": true,
      "models": [
        "claude-3-5-sonnet-20241022",
        "claude-3-5-haiku-20241022",
        "claude-3-opus-20240229"
      ]
    },
    {
      "provider": "google",
      "ok": false,
      "error": "No token available"
    }
  ]
}

List specific provider

struktur models --provider openai
Output:
{
  "providers": [
    {
      "provider": "openai",
      "ok": true,
      "models": [
        "gpt-4o",
        "gpt-4o-mini",
        "gpt-4-turbo",
        "gpt-3.5-turbo"
      ]
    }
  ]
}

verify command

Validate artifact JSON format.
# Verify file
struktur verify --input artifacts.json

# Verify from stdin
cat artifacts.json | struktur verify
Output on success:
{
  "valid": true,
  "artifacts": 3
}
Output on failure:
Error: Invalid artifact format: ...

Progress indicators

The CLI shows automatic progress bars when running in a TTY:
◈ ▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱▱▱ 50% | processing 5/10
Progress is hidden when:
  • Output is redirected to a file
  • Piping to another command
  • Running in non-interactive mode

Environment variables

The CLI respects standard environment variables:
# Provider API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_GENERATIVE_AI_API_KEY="..."
export OPENROUTER_API_KEY="sk-or-..."
export OPENCODE_API_KEY="..."

# Config directory (default: ~/.config/struktur)
export STRUKTUR_CONFIG_DIR="/custom/path"

# Disable keychain on macOS
export STRUKTUR_DISABLE_KEYCHAIN=1

# Custom keychain service name (default: struktur)
export STRUKTUR_KEYCHAIN_SERVICE="my-app"

Complete workflow example

1

Configure your provider

struktur auth set --provider anthropic --token sk-ant-... --default
2

Create a schema file

schema.json
{
  "type": "object",
  "properties": {
    "title": { "type": "string" },
    "authors": {
      "type": "array",
      "items": { "type": "string" }
    },
    "abstract": { "type": "string" },
    "citations": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "title": { "type": "string" },
          "year": { "type": "number" }
        },
        "required": ["title"]
      }
    }
  },
  "required": ["title", "authors"],
  "additionalProperties": false
}
3

Run the extraction

struktur extract-file \
  --input research-paper.pdf \
  --schema schema.json \
  --strategy parallel \
  --chunk-size 12000 \
  --output result.json
4

Process the results

# Pretty-print results
cat result.json | jq .

# Extract specific fields
cat result.json | jq '.title, .authors'

# Count citations
cat result.json | jq '.citations | length'

Error handling

The CLI exits with status code 1 on errors and writes error messages to stderr:
struktur extract-file --input missing.pdf --schema schema.json
# Error: ENOENT: no such file or directory
# Exit code: 1

struktur auth get --provider unconfigured
# Error: No token stored for provider: unconfigured
# Exit code: 1
This allows for proper error handling in scripts:
if struktur extract-file --input doc.pdf --schema schema.json > output.json; then
  echo "Extraction succeeded"
  cat output.json | jq .
else
  echo "Extraction failed"
  exit 1
fi

Tips and tricks

Batch processing

for file in documents/*.pdf; do
  name=$(basename "$file" .pdf)
  struktur extract-file \
    --input "$file" \
    --schema schema.json \
    --output "results/${name}.json"
done

Using with jq for schema generation

# Generate schema from example JSON
echo '{"title": "Example", "count": 42}' | \
  jq '{type: "object", properties: (to_entries | map({key: .key, value: {type: (.value | type)}}) | from_entries), required: keys}' > schema.json

Piping between commands

# Extract, then filter, then format
struktur extract-file \
  --input document.pdf \
  --schema schema.json | \
  jq '.citations[] | select(.year > 2020)' | \
  jq -s '.' > recent-citations.json

Build docs developers (and LLMs) love