Basic Usage

Overview

Zerox is a vision-powered OCR SDK that converts documents (PDFs, images, Excel files) to markdown using large language models. This guide covers the basic setup and usage.

Installation

Node.js
Python

npm install zerox-ocr

pip install zerox-ocr

Basic Example

Node.js
Python

import { zerox } from 'zerox-ocr';

const result = await zerox({
  filePath: 'path/to/document.pdf',
  model: 'gpt-4o',
  credentials: {
    apiKey: process.env.OPENAI_API_KEY
  }
});

console.log(result.pages);

from zerox import zerox

result = await zerox(
    file_path="path/to/document.pdf",
    model="gpt-4o",
    credentials={
        "api_key": os.getenv("OPENAI_API_KEY")
    }
)

print(result.pages)

File Input Options

Zerox accepts multiple input formats:

Local Files

Provide a local file path:

const result = await zerox({
  filePath: './documents/invoice.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY }
});

Remote URLs

Provide a URL to download the file:

const result = await zerox({
  filePath: 'https://example.com/document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY }
});

Supported File Types

PDF: .pdf
Images: .png, .jpg, .jpeg, .heic
Documents: Word documents (converted to PDF)
Spreadsheets: Excel files (.xlsx, .xls)

Required Parameters

filePath

string

required

Path to the file (local or URL) to process

credentials

object

required

API credentials for the model provider. For OpenAI:

{ apiKey: "your-api-key" }

Basic Configuration

model

string

default:"gpt-4o"

The vision model to use. Options include:

gpt-4o (default)
gpt-4o-mini
gpt-4.1
gpt-4.1-mini
Claude models via Bedrock
Gemini models via Google AI

modelProvider

string

default:"OPENAI"

The model provider. Options:

OPENAI (default)
AZURE
BEDROCK
GOOGLE

outputDir

string

Optional directory to save the output markdown file

Output Structure

The zerox() function returns a ZeroxOutput object:

interface ZeroxOutput {
  completionTime: number;      // Time taken in milliseconds
  extracted: Record<string, unknown> | null;  // Extracted data (if schema provided)
  fileName: string;            // Processed file name
  inputTokens: number;         // Total input tokens used
  outputTokens: number;        // Total output tokens used
  pages: Page[];               // Array of processed pages
  summary: Summary;            // Processing summary
}

Page Structure

Each page in the pages array contains:

interface Page {
  content?: string;           // Markdown content
  contentLength?: number;     // Length of content
  page: number;               // Page number (1-indexed)
  status: "SUCCESS" | "ERROR";  // Processing status
  error?: string;             // Error message if failed
  inputTokens?: number;       // Tokens used for this page
  outputTokens?: number;      // Tokens generated for this page
}

Summary Structure

interface Summary {
  totalPages: number;
  ocr: {
    successful: number;
    failed: number;
  } | null;
  extracted: {
    successful: number;
    failed: number;
  } | null;
}

Example Output

const result = await zerox({
  filePath: 'invoice.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY }
});

console.log(`Processed ${result.summary.totalPages} pages`);
console.log(`Completion time: ${result.completionTime}ms`);
console.log(`Tokens used: ${result.inputTokens + result.outputTokens}`);

// Access individual pages
result.pages.forEach(page => {
  if (page.status === 'SUCCESS') {
    console.log(`Page ${page.page}: ${page.content?.substring(0, 100)}...`);
  }
});

// Get full document as markdown
const fullMarkdown = result.pages
  .map(page => page.content)
  .join('\n\n');

Writing Output to File

To automatically save the markdown output:

const result = await zerox({
  filePath: 'document.pdf',
  outputDir: './output',
  credentials: { apiKey: process.env.OPENAI_API_KEY }
});

// File saved to: ./output/{fileName}.md
console.log(`Saved to: ./output/${result.fileName}.md`);

Temporary Files

tempDir

string

default:"os.tmpdir()"

Directory for temporary files during processing

cleanup

boolean

default:true

Whether to clean up temporary files after processing

const result = await zerox({
  filePath: 'document.pdf',
  tempDir: './temp',
  cleanup: false,  // Keep temp files for debugging
  credentials: { apiKey: process.env.OPENAI_API_KEY }
});

Next Steps

Data Extraction - Extract structured data using schemas
Advanced Options - Fine-tune OCR behavior
Error Handling - Handle errors and retries
Performance Tuning - Optimize for speed and cost

Get Started

Installation

Core Concepts

Guides

Overview

Installation

Basic Example

File Input Options

Local Files

Remote URLs

Supported File Types

Required Parameters

Basic Configuration

Output Structure

Page Structure

Summary Structure

Example Output

Writing Output to File

Temporary Files

Next Steps

Build docs developers (and LLMs) love

Get Started

Installation

Core Concepts

Guides

​Overview

​Installation

​Basic Example

​File Input Options

​Local Files

​Remote URLs

​Supported File Types

​Required Parameters

​Basic Configuration

​Output Structure

​Page Structure

​Summary Structure

​Example Output

​Writing Output to File

​Temporary Files

​Next Steps

Build docs developers (and LLMs) love

Overview

Installation

Basic Example

File Input Options

Local Files

Remote URLs

Supported File Types

Required Parameters

Basic Configuration

Output Structure

Page Structure

Summary Structure

Example Output

Writing Output to File

Temporary Files

Next Steps