Skip to main content

Zerox

Vision-Powered OCR for AI Ingestion

Convert PDFs, documents, and images to clean markdown using GPT-4o, Claude, Gemini and other vision models. Documents are visual after all — let vision models make sense of complex layouts, tables, and charts.

How It Works

Zerox makes document OCR dead simple by leveraging vision models:
1

Pass in a file

Upload a PDF, DOCX, image, or any of 20+ supported formats
2

Convert to images

Files are converted into a series of high-quality images
3

Vision model processing

Each image is sent to GPT-4o, Claude, or Gemini for markdown conversion
4

Get markdown output

Receive clean, structured markdown perfect for AI ingestion and RAG systems

Key Features

Multi-Provider Support

Works with OpenAI, Azure OpenAI, AWS Bedrock, and Google Gemini

20+ File Formats

Supports PDF, DOCX, XLSX, images, and more out of the box

Structured Data Extraction

Extract specific fields using JSON schemas for forms, invoices, and tables

Dual SDKs

Available for both Node.js and Python with async APIs

Smart Processing

Auto-corrects orientation, trims edges, and handles concurrent pages

Format Preservation

Maintain consistent formatting across pages with tabular data

Quick Example

import { zerox } from "zerox";

const result = await zerox({
  filePath: "https://example.com/document.pdf",
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

console.log(result.pages[0].content); // Markdown output

Trusted by Developers

12,000+ GitHub Stars

Join thousands of developers using Zerox for document processing in production

Get Started

Quickstart

Get up and running in 5 minutes

Node.js Setup

Install the Node.js SDK

Python Setup

Install the Python SDK

Data Extraction

Extract structured data from documents using schemas

Model Providers

Configure OpenAI, Azure, Bedrock, or Gemini

Batch Processing

Process multiple documents efficiently

Invoice Extraction

Extract data from invoices and forms

Build docs developers (and LLMs) love