Skip to main content

Installation

Install the Zerox package:
npm install zerox

System Dependencies

Zerox requires GraphicsMagick and Ghostscript for PDF processing. These are usually installed automatically, but you may need to install them manually:
sudo apt-get update
sudo apt-get install -y graphicsmagick ghostscript

Set Up API Keys

Zerox uses vision models from various providers. Set up your API key for your chosen provider:
export OPENAI_API_KEY="your-api-key-here"

Basic Usage

Process a PDF from URL

import { zerox } from "zerox";

const result = await zerox({
  filePath: "https://omni-demo-data.s3.amazonaws.com/test/cs101.pdf",
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

console.log(result.pages[0].content); // Markdown content
console.log(`Processed ${result.pages.length} pages`);
console.log(`Total tokens: ${result.inputTokens + result.outputTokens}`);

Process a Local File

import { zerox } from "zerox";
import path from "path";

const result = await zerox({
  filePath: path.resolve(__dirname, "./document.pdf"),
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

// Access individual pages
result.pages.forEach((page) => {
  console.log(`Page ${page.page}:`);
  console.log(page.content);
});

Save Output to File

import { zerox } from "zerox";

const result = await zerox({
  filePath: "path/to/document.pdf",
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
  outputDir: "./output", // Saves as output/document.md
});

console.log(`Saved to: ${result.fileName}.md`);

Understanding the Output

Zerox returns a structured object with the following information:
{
  completionTime: number,      // Time taken in milliseconds
  fileName: string,            // Sanitized filename without extension
  inputTokens: number,         // Total input tokens used
  outputTokens: number,        // Total output tokens used
  pages: [                     // Array of processed pages
    {
      page: number,            // Page number (1-indexed)
      content: string,         // Markdown content
      contentLength: number,   // Length of content
      status: "SUCCESS"        // Processing status
    }
  ],
  summary: {
    totalPages: number,
    ocr: {
      successful: number,
      failed: number
    }
  }
}
The pages array contains the Markdown content for each page. You can combine them or process them individually based on your needs.

Using Different Models

Zerox supports multiple vision model providers. Here’s how to use each:
import { zerox } from "zerox";
import { ModelOptions, ModelProvider } from "zerox/node-zerox/dist/types";

const result = await zerox({
  filePath: "path/to/file.pdf",
  modelProvider: ModelProvider.OPENAI,
  model: ModelOptions.OPENAI_GPT_4O,
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

Common Configuration Options

Here are some frequently used options to customize Zerox’s behavior:
import { zerox } from "zerox";

const result = await zerox({
  filePath: "path/to/file.pdf",
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
  
  // Process only specific pages (e.g., pages 1, 3, and 5)
  pagesToConvertAsImages: [1, 3, 5],
  
  // Control parallel processing
  concurrency: 5,
  
  // Maintain consistent formatting across pages (slower but better for tables)
  maintainFormat: true,
  
  // Save combined markdown to a file
  outputDir: "./output",
  
  // Keep temporary images after processing
  cleanup: false,
  
  // Add custom instructions for the vision model
  prompt: "Extract all tables and preserve their structure exactly.",
  
  // Adjust image quality
  imageDensity: 300,  // DPI for image conversion
  imageHeight: 2048,  // Maximum height in pixels
});
The maintainFormat option processes pages sequentially (not in parallel) because it passes the previous page’s output as context. This is slower but produces more consistent formatting, especially for tables that span multiple pages.

Next Steps

Structured Data Extraction

Learn how to extract specific fields using JSON schemas

Advanced Configuration

Explore all configuration options and fine-tuning

Node.js API Reference

Complete API documentation for all parameters

Examples

Real-world examples and use cases

Build docs developers (and LLMs) love