Introduction

What is Zerox?

Zerox is a vision-powered OCR SDK that converts documents into clean Markdown format for AI ingestion. Instead of traditional OCR engines, Zerox leverages vision language models (GPT-4 Vision, Claude, Gemini, etc.) to understand document layouts, tables, charts, and complex formatting—making it ideal for AI applications that need structured document data.

Try the Demo

Test Zerox with your documents in the hosted demo

How It Works

Zerox follows a simple four-step process:

Pass in a file

Provide a PDF, DOCX, image, or other supported document format via file path or URL

Convert to images

Zerox converts your document into a series of high-quality images (one per page)

Extract with vision models

Each image is sent to your chosen vision model (GPT-4o, Claude, Gemini, etc.) with instructions to extract Markdown

Aggregate and return

The Markdown responses are aggregated and returned with metadata (tokens, completion time, etc.)

Key Features

Multi-Format Support

Process PDFs, Word docs, images (PNG, JPEG, HEIC), spreadsheets, presentations, and 20+ document formats

Multiple Model Providers

Works with OpenAI, Azure OpenAI, AWS Bedrock, Google Gemini, and Vertex AI

Structured Data Extraction

Extract specific fields using JSON schemas instead of full document conversion

Format Preservation

Maintain consistent formatting across pages—perfect for tables and structured content

Smart Image Processing

Automatic orientation correction, edge trimming, and image compression

Concurrent Processing

Process multiple pages in parallel with configurable concurrency limits

When to Use Zerox

✅ Perfect for AI applications

Converting documents for RAG (Retrieval Augmented Generation) systems
Extracting data from invoices, receipts, and forms
Processing research papers, reports, and technical documents
Building document analysis pipelines
Creating searchable archives from scanned documents

⚠️ Not ideal for

Real-time OCR on mobile devices (requires API calls to vision models)
Offline-only environments (requires internet for model API access)
Extremely high-volume processing on tight budgets (vision model API costs)
Simple text extraction where traditional OCR is sufficient

Output Format

Zerox returns structured data with per-page content, token usage, and metadata:

{
  "completionTime": 10038,
  "fileName": "invoice_36258",
  "inputTokens": 25543,
  "outputTokens": 210,
  "pages": [
    {
      "page": 1,
      "content": "# INVOICE # 36258\n**Date:** Mar 06 2012...",
      "contentLength": 747,
      "status": "SUCCESS"
    }
  ],
  "summary": {
    "totalPages": 1,
    "ocr": {
      "successful": 1,
      "failed": 0
    }
  }
}

Available SDKs

Zerox is available as both a Node.js and Python package:

npm install zerox

Supported Models

Zerox works with vision models from multiple providers:

Provider	Models
OpenAI	GPT-4o, GPT-4o-mini, GPT-4.1, GPT-4.1-mini
Azure OpenAI	GPT-4o, GPT-4o-mini, GPT-4.1, GPT-4.1-mini
AWS Bedrock	Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku (multiple versions)
Google Gemini	Gemini 1.5 (Flash, Flash-8B, Pro), Gemini 2.0 (Flash, Flash-Lite)
Vertex AI	Gemini models (Python only)

All vision models from these providers are supported. Refer to each provider’s documentation for the most up-to-date model availability.

Supported File Types

Zerox supports 20+ document formats including:

Documents: PDF, DOC, DOCX, ODT, RTF, TXT, HTML, XML
Spreadsheets: XLS, XLSX, ODS, CSV, TSV
Presentations: PPT, PPTX, ODP
Images: PNG, JPG, JPEG, HEIC
Other: WPS, WPD

For non-image/non-PDF files, Zerox uses LibreOffice to convert to PDF first, then to images. Make sure LibreOffice is installed on your system if you need to process these formats.

Get Started

Installation

Core Concepts

Guides

Introduction

What is Zerox?

Try the Demo

How It Works

Key Features

Multi-Format Support

Multiple Model Providers

Structured Data Extraction

Format Preservation

Smart Image Processing

Concurrent Processing

When to Use Zerox

Output Format

Available SDKs

Supported Models

Supported File Types

Next Steps

Quickstart

API Reference

Build docs developers (and LLMs) love

Get Started

Installation

Core Concepts

Guides

​What is Zerox?

Try the Demo

​How It Works

​Key Features

Multi-Format Support

Multiple Model Providers

Structured Data Extraction

Format Preservation

Smart Image Processing

Concurrent Processing

​When to Use Zerox

​Output Format

​Available SDKs

​Supported Models

​Supported File Types

​Next Steps

Quickstart

API Reference

Build docs developers (and LLMs) love

What is Zerox?

How It Works

Key Features

When to Use Zerox

Output Format

Available SDKs

Supported Models

Supported File Types

Next Steps