Skip to main content

Overview

Zerox provides advanced options to control image processing, formatting, and OCR behavior. This guide covers all advanced configuration parameters.

Format Preservation

Maintain Format

Preserve document formatting across pages for consistent output:
import { zerox } from 'zerox-ocr';

const result = await zerox({
  filePath: 'report.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  maintainFormat: true  // Preserve formatting across pages
});
maintainFormat
boolean
default:false
When true, Zerox processes pages sequentially and provides context from the previous page to maintain consistent formatting (headers, tables, lists, etc.). This increases accuracy but reduces concurrency.Important: Cannot be used with extractOnly mode.
With maintainFormat: true, pages are processed sequentially rather than in parallel. This ensures formatting consistency but increases processing time.

How It Works

When enabled, each page receives context from the previous page:
// Internal behavior (simplified)
let priorPage = "";

for (const page of pages) {
  const result = await processPage(page, {
    prompt: `Maintain consistent formatting with: ${priorPage}`
  });
  priorPage = result.content;
}

Image Processing

Orientation Correction

Automatically detect and correct image orientation:
const result = await zerox({
  filePath: 'scanned-document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  correctOrientation: true  // Default: true
});
correctOrientation
boolean
default:true
Uses Tesseract OCR to detect image orientation and rotates images as needed. Improves accuracy for scanned or rotated documents.
The system uses Tesseract workers to analyze each image:
// Determines optimal rotation (0°, 90°, 180°, 270°)
const rotation = await determineOptimalRotation(image);
console.log(`Rotating image ${rotation} degrees`);

Edge Trimming

Remove whitespace and borders from images:
const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  trimEdges: true  // Default: true
});
trimEdges
boolean
default:true
Automatically trims whitespace around the content. Reduces token usage and improves focus on actual content.

Image Conversion Options

Image Format

Choose the format for converted images:
const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  imageFormat: 'jpeg'  // 'png' (default) or 'jpeg'
});
imageFormat
string
default:"png"
Format for converted images:
  • png: Higher quality, larger file size
  • jpeg: Smaller file size, slight quality loss

Image Density

Control the resolution of PDF-to-image conversion:
const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  imageDensity: 300  // DPI (dots per inch)
});
imageDensity
number
DPI for PDF-to-image conversion. Higher values produce clearer images but larger file sizes.
  • 150 DPI: Low quality, faster processing
  • 300 DPI: Standard quality (recommended)
  • 600 DPI: High quality, slower processing

Image Height

Specify a fixed height for converted images:
const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  imageHeight: 2048  // Height in pixels
});
imageHeight
number
Fixed height for images in pixels. Width is adjusted to maintain aspect ratio. Useful for standardizing image sizes.

Image Compression

Automatically compress images to reduce size:
const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  maxImageSize: 10  // Maximum size in MB
});
maxImageSize
number
default:15
Maximum image size in MB. Images larger than this are automatically compressed. Set to 0 to disable compression.
The compression algorithm:
// Starts at 90% quality, reduces by 10% until target size is reached
let quality = 90;
while (imageSize > maxSize && quality >= 20) {
  compressedImage = await compress(image, quality);
  quality -= 10;
}

Page Selection

Process specific pages instead of the entire document:
// Process single page
const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  pagesToConvertAsImages: 5  // Only page 5
});

// Process multiple pages
const result2 = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  pagesToConvertAsImages: [1, 3, 5, 7]  // Specific pages
});

// Process all pages (default)
const result3 = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  pagesToConvertAsImages: -1  // All pages (default)
});
pagesToConvertAsImages
number | number[]
Pages to process:
  • -1: Process all pages (default)
  • number: Process single page (1-indexed)
  • number[]: Process specific pages
Invalid page numbers are filtered out automatically.

Concurrency Control

Control parallel processing of pages:
const result = await zerox({
  filePath: 'large-document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  concurrency: 5  // Process 5 pages at a time
});
concurrency
number
default:10
Maximum number of pages to process concurrently. Higher values increase speed but also API rate limit risk.Note: Ignored when maintainFormat: true (sequential processing).
See Performance Tuning for optimization strategies.

Tesseract Workers

Configure Tesseract OCR worker pool:
const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  correctOrientation: true,
  maxTesseractWorkers: 8  // Maximum worker threads
});
maxTesseractWorkers
number
Maximum number of Tesseract worker threads for orientation detection:
  • -1: Unlimited (default)
  • number: Limit worker count
Only relevant when correctOrientation: true.
By default, Zerox creates 3 initial workers and scales up as needed:
// From constants.ts
const NUM_STARTING_WORKERS = 3;

// Workers scale based on document size
if (numPages > NUM_STARTING_WORKERS) {
  await addMoreWorkers(numPages - NUM_STARTING_WORKERS);
}

Custom Prompts

Provide custom instructions for OCR:
const result = await zerox({
  filePath: 'technical-doc.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  prompt: `
    Additional instructions:
    - Preserve all code blocks with syntax highlighting
    - Convert diagrams to Mermaid syntax when possible
    - Keep all mathematical equations in LaTeX format
  `
});
prompt
string
Custom instructions appended to the system prompt. Use to provide domain-specific guidance.
The base system prompt is defined in constants.ts:
const SYSTEM_PROMPT_BASE = `
Convert the following document to markdown.
Return only the markdown with no explanation text.

RULES:
  - Include all information on the page
  - Return tables in HTML format
  - Interpret charts & infographics to markdown
  - Use ☐ and ☑ for checkboxes
  - Wrap logos: <logo>Brand Name</logo>
  - Wrap watermarks: <watermark>TEXT</watermark>
  - Wrap page numbers: <page_number>14</page_number>
`;

LLM Parameters

Fine-tune model behavior:
const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  llmParams: {
    temperature: 0.1,     // Lower = more deterministic
    maxTokens: 4096,      // Maximum output tokens
    topP: 0.9,           // Nucleus sampling
    frequencyPenalty: 0,  // Penalize frequent tokens
    presencePenalty: 0,   // Penalize repeated topics
    logprobs: true        // Return log probabilities
  }
});
llmParams
object
Model parameters. Available options depend on the provider:OpenAI/Azure:
  • temperature (0-2): Randomness (default: 0.7)
  • maxTokens: Maximum output tokens
  • topP (0-1): Nucleus sampling
  • frequencyPenalty (-2 to 2): Penalize frequent tokens
  • presencePenalty (-2 to 2): Penalize repeated content
  • logprobs (boolean): Return log probabilities
Bedrock:
  • Similar to OpenAI, but no logprobs
Google:
  • Uses maxOutputTokens instead of maxTokens

Custom Model Function

Provide a custom function for OCR processing:
import { CompletionResponse } from 'zerox-ocr';

const customModel = async ({
  buffers,
  image,
  maintainFormat,
  pageNumber,
  priorPage
}) => {
  // Your custom vision model logic
  const result = await myVisionModel(buffers[0]);
  
  return {
    content: result.text,
    inputTokens: result.inputTokens,
    outputTokens: result.outputTokens
  } as CompletionResponse;
};

const result = await zerox({
  filePath: 'document.pdf',
  customModelFunction: customModel
});
customModelFunction
function
Custom function for OCR processing. Receives:
  • buffers: Array of image buffers (may be split for tall images)
  • image: Path to the image file
  • maintainFormat: Whether to maintain formatting
  • pageNumber: Current page number
  • priorPage: Content from previous page (if maintainFormat: true)
Must return a CompletionResponse object.

Complete Advanced Example

import { zerox } from 'zerox-ocr';

const result = await zerox({
  // Input
  filePath: 'technical-report.pdf',
  pagesToConvertAsImages: [1, 2, 3, 4, 5],
  
  // Model configuration
  model: 'gpt-4o',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  llmParams: {
    temperature: 0.1,
    maxTokens: 4096,
    logprobs: true
  },
  
  // Image processing
  correctOrientation: true,
  trimEdges: true,
  imageFormat: 'png',
  imageDensity: 300,
  maxImageSize: 10,
  
  // OCR behavior
  maintainFormat: true,
  prompt: 'Preserve code blocks and technical diagrams',
  
  // Performance
  concurrency: 5,
  maxTesseractWorkers: 8,
  
  // Output
  outputDir: './output',
  tempDir: './temp',
  cleanup: true
});

console.log(`Processed ${result.pages.length} pages`);
console.log(`Total tokens: ${result.inputTokens + result.outputTokens}`);

Next Steps

Build docs developers (and LLMs) love