Artifact objects. You can create custom providers for any document format.
Artifact provider interface
An artifact provider is a function that takes a buffer and returns anArtifact:
import type { ArtifactProvider, Artifact } from "@mateffy/struktur";
const myProvider: ArtifactProvider = async (buffer: Buffer): Promise<Artifact> => {
// Parse the buffer and return an Artifact
return {
id: "unique-id",
type: "text", // or "pdf", "image", "file"
raw: async () => buffer,
contents: [
{
page: 1,
text: "Extracted text content",
media: [], // Optional images
},
],
};
};
CSV provider example
Create a provider that converts CSV files to artifacts:import { fileToArtifact } from "@mateffy/struktur";
import type { ArtifactProvider } from "@mateffy/struktur";
const csvProvider: ArtifactProvider = async (buffer: Buffer) => {
const text = buffer.toString("utf-8");
const lines = text.split("\n");
const header = lines[0];
const rows = lines.slice(1).filter(line => line.trim());
const formatted = [
`CSV Data (${rows.length} rows)`,
`Columns: ${header}`,
"",
...rows.map((row, i) => `Row ${i + 1}: ${row}`),
].join("\n");
return {
id: "csv-data",
type: "file",
raw: async () => buffer,
contents: [{ text: formatted }],
};
};
// Use the provider
const buffer = Buffer.from("name,age,city\nAlice,30,NYC\nBob,25,LA");
const artifact = await fileToArtifact(buffer, {
mimeType: "text/csv",
providers: {
"text/csv": csvProvider,
},
});
Markdown with frontmatter
Parse markdown files with YAML frontmatter:import type { ArtifactProvider } from "@mateffy/struktur";
const markdownProvider: ArtifactProvider = async (buffer: Buffer) => {
const text = buffer.toString("utf-8");
// Extract frontmatter
const frontmatterMatch = text.match(/^---\n([\s\S]*?)\n---\n([\s\S]*)$/);
if (!frontmatterMatch) {
return {
id: "markdown",
type: "text",
raw: async () => buffer,
contents: [{ text }],
};
}
const [, frontmatter, content] = frontmatterMatch;
// Parse YAML frontmatter (simplified)
const metadata: Record<string, string> = {};
frontmatter.split("\n").forEach(line => {
const [key, ...valueParts] = line.split(":");
if (key && valueParts.length) {
metadata[key.trim()] = valueParts.join(":").trim();
}
});
return {
id: metadata.id || "markdown",
type: "text",
raw: async () => buffer,
contents: [{ text: content.trim() }],
metadata,
};
};
// Use the provider
const mdBuffer = Buffer.from(`---
title: My Document
author: Alice
---
# Heading
Content here.`);
const artifact = await fileToArtifact(mdBuffer, {
mimeType: "text/markdown",
providers: {
"text/markdown": markdownProvider,
},
});
console.log(artifact.metadata);
// { title: "My Document", author: "Alice" }
Image provider with OCR
Create a provider that extracts text from images:import type { ArtifactProvider } from "@mateffy/struktur";
const ocrImageProvider: ArtifactProvider = async (buffer: Buffer) => {
// In a real implementation, use an OCR library like Tesseract
const base64 = buffer.toString("base64");
// Simulated OCR result
const ocrText = "Text extracted from image via OCR";
return {
id: "image-with-text",
type: "image",
raw: async () => buffer,
contents: [
{
text: ocrText,
media: [
{
type: "image",
base64,
},
],
},
],
};
};
// Use the provider
const imageBuffer = await Bun.file("document.png").arrayBuffer();
const artifact = await fileToArtifact(Buffer.from(imageBuffer), {
mimeType: "image/png",
providers: {
"image/png": ocrImageProvider,
},
});
PDF provider with page splitting
Split PDF pages into separate content entries:import type { ArtifactProvider, ArtifactContent } from "@mateffy/struktur";
const pdfProvider: ArtifactProvider = async (buffer: Buffer) => {
// In a real implementation, use a PDF library like pdf-parse or pdfjs
// This is a simplified example
const pages: ArtifactContent[] = [
{ page: 1, text: "Content from page 1" },
{ page: 2, text: "Content from page 2" },
{ page: 3, text: "Content from page 3" },
];
return {
id: "multi-page-pdf",
type: "pdf",
raw: async () => buffer,
contents: pages,
};
};
// Use the provider
const pdfBuffer = await Bun.file("document.pdf").arrayBuffer();
const artifact = await fileToArtifact(Buffer.from(pdfBuffer), {
mimeType: "application/pdf",
providers: {
"application/pdf": pdfProvider,
},
});
console.log(`PDF has ${artifact.contents.length} pages`);
Global provider registration
Register providers globally to use across your application:import { fileToArtifact, defaultArtifactProviders } from "@mateffy/struktur";
// Add providers to the default registry
defaultArtifactProviders["text/csv"] = csvProvider;
defaultArtifactProviders["text/markdown"] = markdownProvider;
defaultArtifactProviders["application/pdf"] = pdfProvider;
// Now use without passing providers each time
const artifact = await fileToArtifact(buffer, {
mimeType: "text/csv",
// Uses defaultArtifactProviders automatically
});
Modifying
defaultArtifactProviders affects all subsequent calls to fileToArtifact. For isolated environments, pass providers directly to each call.Multi-tenant isolation
Keep providers isolated per tenant or request:import type { ArtifactProviders } from "@mateffy/struktur";
class TenantArtifactRegistry {
private providers: Map<string, ArtifactProviders> = new Map();
register(tenantId: string, mimeType: string, provider: ArtifactProvider) {
if (!this.providers.has(tenantId)) {
this.providers.set(tenantId, {});
}
this.providers.get(tenantId)![mimeType] = provider;
}
getProviders(tenantId: string): ArtifactProviders {
return this.providers.get(tenantId) || {};
}
}
const registry = new TenantArtifactRegistry();
// Register per tenant
registry.register("tenant-a", "text/csv", csvProviderA);
registry.register("tenant-b", "text/csv", csvProviderB);
// Use per tenant
const artifactA = await fileToArtifact(buffer, {
mimeType: "text/csv",
providers: registry.getProviders("tenant-a"),
});
Testing custom providers
Test providers with sample buffers:import { expect, test } from "bun:test";
test("CSV provider parses rows correctly", async () => {
const buffer = Buffer.from("name,age\nAlice,30\nBob,25");
const artifact = await csvProvider(buffer);
expect(artifact.type).toBe("file");
expect(artifact.contents[0].text).toContain("2 rows");
expect(artifact.contents[0].text).toContain("Alice,30");
});
test("Markdown provider extracts frontmatter", async () => {
const buffer = Buffer.from("---\ntitle: Test\n---\nContent");
const artifact = await markdownProvider(buffer);
expect(artifact.metadata?.title).toBe("Test");
expect(artifact.contents[0].text).toBe("Content");
});
Next steps
Artifact types
Learn about artifact structure and types
Provider API
Complete provider API reference