Skip to main content

Overview

Zerox is a vision-powered OCR SDK that converts documents (PDFs, images, Excel files) to markdown using large language models. This guide covers the basic setup and usage.

Installation

npm install zerox-ocr

Basic Example

import { zerox } from 'zerox-ocr';

const result = await zerox({
  filePath: 'path/to/document.pdf',
  model: 'gpt-4o',
  credentials: {
    apiKey: process.env.OPENAI_API_KEY
  }
});

console.log(result.pages);

File Input Options

Zerox accepts multiple input formats:

Local Files

Provide a local file path:
const result = await zerox({
  filePath: './documents/invoice.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY }
});

Remote URLs

Provide a URL to download the file:
const result = await zerox({
  filePath: 'https://example.com/document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY }
});

Supported File Types

  • PDF: .pdf
  • Images: .png, .jpg, .jpeg, .heic
  • Documents: Word documents (converted to PDF)
  • Spreadsheets: Excel files (.xlsx, .xls)

Required Parameters

filePath
string
required
Path to the file (local or URL) to process
credentials
object
required
API credentials for the model provider. For OpenAI:
{ apiKey: "your-api-key" }

Basic Configuration

model
string
default:"gpt-4o"
The vision model to use. Options include:
  • gpt-4o (default)
  • gpt-4o-mini
  • gpt-4.1
  • gpt-4.1-mini
  • Claude models via Bedrock
  • Gemini models via Google AI
modelProvider
string
default:"OPENAI"
The model provider. Options:
  • OPENAI (default)
  • AZURE
  • BEDROCK
  • GOOGLE
outputDir
string
Optional directory to save the output markdown file

Output Structure

The zerox() function returns a ZeroxOutput object:
interface ZeroxOutput {
  completionTime: number;      // Time taken in milliseconds
  extracted: Record<string, unknown> | null;  // Extracted data (if schema provided)
  fileName: string;            // Processed file name
  inputTokens: number;         // Total input tokens used
  outputTokens: number;        // Total output tokens used
  pages: Page[];               // Array of processed pages
  summary: Summary;            // Processing summary
}

Page Structure

Each page in the pages array contains:
interface Page {
  content?: string;           // Markdown content
  contentLength?: number;     // Length of content
  page: number;               // Page number (1-indexed)
  status: "SUCCESS" | "ERROR";  // Processing status
  error?: string;             // Error message if failed
  inputTokens?: number;       // Tokens used for this page
  outputTokens?: number;      // Tokens generated for this page
}

Summary Structure

interface Summary {
  totalPages: number;
  ocr: {
    successful: number;
    failed: number;
  } | null;
  extracted: {
    successful: number;
    failed: number;
  } | null;
}

Example Output

const result = await zerox({
  filePath: 'invoice.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY }
});

console.log(`Processed ${result.summary.totalPages} pages`);
console.log(`Completion time: ${result.completionTime}ms`);
console.log(`Tokens used: ${result.inputTokens + result.outputTokens}`);

// Access individual pages
result.pages.forEach(page => {
  if (page.status === 'SUCCESS') {
    console.log(`Page ${page.page}: ${page.content?.substring(0, 100)}...`);
  }
});

// Get full document as markdown
const fullMarkdown = result.pages
  .map(page => page.content)
  .join('\n\n');

Writing Output to File

To automatically save the markdown output:
const result = await zerox({
  filePath: 'document.pdf',
  outputDir: './output',
  credentials: { apiKey: process.env.OPENAI_API_KEY }
});

// File saved to: ./output/{fileName}.md
console.log(`Saved to: ./output/${result.fileName}.md`);

Temporary Files

tempDir
string
default:"os.tmpdir()"
Directory for temporary files during processing
cleanup
boolean
default:true
Whether to clean up temporary files after processing
const result = await zerox({
  filePath: 'document.pdf',
  tempDir: './temp',
  cleanup: false,  // Keep temp files for debugging
  credentials: { apiKey: process.env.OPENAI_API_KEY }
});

Next Steps

Build docs developers (and LLMs) love