What is Zerox?
Zerox is a vision-powered OCR SDK that converts documents into clean Markdown format for AI ingestion. Instead of traditional OCR engines, Zerox leverages vision language models (GPT-4 Vision, Claude, Gemini, etc.) to understand document layouts, tables, charts, and complex formatting—making it ideal for AI applications that need structured document data.Try the Demo
Test Zerox with your documents in the hosted demo
How It Works
Zerox follows a simple four-step process:Extract with vision models
Each image is sent to your chosen vision model (GPT-4o, Claude, Gemini, etc.) with instructions to extract Markdown
Key Features
Multi-Format Support
Process PDFs, Word docs, images (PNG, JPEG, HEIC), spreadsheets, presentations, and 20+ document formats
Multiple Model Providers
Works with OpenAI, Azure OpenAI, AWS Bedrock, Google Gemini, and Vertex AI
Structured Data Extraction
Extract specific fields using JSON schemas instead of full document conversion
Format Preservation
Maintain consistent formatting across pages—perfect for tables and structured content
Smart Image Processing
Automatic orientation correction, edge trimming, and image compression
Concurrent Processing
Process multiple pages in parallel with configurable concurrency limits
When to Use Zerox
✅ Perfect for AI applications
✅ Perfect for AI applications
- Converting documents for RAG (Retrieval Augmented Generation) systems
- Extracting data from invoices, receipts, and forms
- Processing research papers, reports, and technical documents
- Building document analysis pipelines
- Creating searchable archives from scanned documents
⚠️ Not ideal for
⚠️ Not ideal for
- Real-time OCR on mobile devices (requires API calls to vision models)
- Offline-only environments (requires internet for model API access)
- Extremely high-volume processing on tight budgets (vision model API costs)
- Simple text extraction where traditional OCR is sufficient
Output Format
Zerox returns structured data with per-page content, token usage, and metadata:Available SDKs
Zerox is available as both a Node.js and Python package:Supported Models
Zerox works with vision models from multiple providers:| Provider | Models |
|---|---|
| OpenAI | GPT-4o, GPT-4o-mini, GPT-4.1, GPT-4.1-mini |
| Azure OpenAI | GPT-4o, GPT-4o-mini, GPT-4.1, GPT-4.1-mini |
| AWS Bedrock | Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku (multiple versions) |
| Google Gemini | Gemini 1.5 (Flash, Flash-8B, Pro), Gemini 2.0 (Flash, Flash-Lite) |
| Vertex AI | Gemini models (Python only) |
All vision models from these providers are supported. Refer to each provider’s documentation for the most up-to-date model availability.
Supported File Types
Zerox supports 20+ document formats including:- Documents: PDF, DOC, DOCX, ODT, RTF, TXT, HTML, XML
- Spreadsheets: XLS, XLSX, ODS, CSV, TSV
- Presentations: PPT, PPTX, ODP
- Images: PNG, JPG, JPEG, HEIC
- Other: WPS, WPD
Next Steps
Quickstart
Get started with Zerox in 5 minutes
API Reference
Explore all configuration options

