Overview
Zerox is a vision-powered OCR SDK that converts documents (PDFs, images, Excel files) to markdown using large language models. This guide covers the basic setup and usage.
Installation
Basic Example
import { zerox } from 'zerox-ocr';
const result = await zerox({
filePath: 'path/to/document.pdf',
model: 'gpt-4o',
credentials: {
apiKey: process.env.OPENAI_API_KEY
}
});
console.log(result.pages);
from zerox import zerox
result = await zerox(
file_path="path/to/document.pdf",
model="gpt-4o",
credentials={
"api_key": os.getenv("OPENAI_API_KEY")
}
)
print(result.pages)
Zerox accepts multiple input formats:
Local Files
Provide a local file path:
const result = await zerox({
filePath: './documents/invoice.pdf',
credentials: { apiKey: process.env.OPENAI_API_KEY }
});
Remote URLs
Provide a URL to download the file:
const result = await zerox({
filePath: 'https://example.com/document.pdf',
credentials: { apiKey: process.env.OPENAI_API_KEY }
});
Supported File Types
- PDF:
.pdf
- Images:
.png, .jpg, .jpeg, .heic
- Documents: Word documents (converted to PDF)
- Spreadsheets: Excel files (
.xlsx, .xls)
Required Parameters
Path to the file (local or URL) to process
API credentials for the model provider. For OpenAI:{ apiKey: "your-api-key" }
Basic Configuration
The vision model to use. Options include:
gpt-4o (default)
gpt-4o-mini
gpt-4.1
gpt-4.1-mini
- Claude models via Bedrock
- Gemini models via Google AI
The model provider. Options:
OPENAI (default)
AZURE
BEDROCK
GOOGLE
Optional directory to save the output markdown file
Output Structure
The zerox() function returns a ZeroxOutput object:
interface ZeroxOutput {
completionTime: number; // Time taken in milliseconds
extracted: Record<string, unknown> | null; // Extracted data (if schema provided)
fileName: string; // Processed file name
inputTokens: number; // Total input tokens used
outputTokens: number; // Total output tokens used
pages: Page[]; // Array of processed pages
summary: Summary; // Processing summary
}
Page Structure
Each page in the pages array contains:
interface Page {
content?: string; // Markdown content
contentLength?: number; // Length of content
page: number; // Page number (1-indexed)
status: "SUCCESS" | "ERROR"; // Processing status
error?: string; // Error message if failed
inputTokens?: number; // Tokens used for this page
outputTokens?: number; // Tokens generated for this page
}
Summary Structure
interface Summary {
totalPages: number;
ocr: {
successful: number;
failed: number;
} | null;
extracted: {
successful: number;
failed: number;
} | null;
}
Example Output
const result = await zerox({
filePath: 'invoice.pdf',
credentials: { apiKey: process.env.OPENAI_API_KEY }
});
console.log(`Processed ${result.summary.totalPages} pages`);
console.log(`Completion time: ${result.completionTime}ms`);
console.log(`Tokens used: ${result.inputTokens + result.outputTokens}`);
// Access individual pages
result.pages.forEach(page => {
if (page.status === 'SUCCESS') {
console.log(`Page ${page.page}: ${page.content?.substring(0, 100)}...`);
}
});
// Get full document as markdown
const fullMarkdown = result.pages
.map(page => page.content)
.join('\n\n');
Writing Output to File
To automatically save the markdown output:
const result = await zerox({
filePath: 'document.pdf',
outputDir: './output',
credentials: { apiKey: process.env.OPENAI_API_KEY }
});
// File saved to: ./output/{fileName}.md
console.log(`Saved to: ./output/${result.fileName}.md`);
Temporary Files
tempDir
string
default:"os.tmpdir()"
Directory for temporary files during processing
Whether to clean up temporary files after processing
const result = await zerox({
filePath: 'document.pdf',
tempDir: './temp',
cleanup: false, // Keep temp files for debugging
credentials: { apiKey: process.env.OPENAI_API_KEY }
});
Next Steps