Advanced Options

Overview

Zerox provides advanced options to control image processing, formatting, and OCR behavior. This guide covers all advanced configuration parameters.

Format Preservation

Maintain Format

Preserve document formatting across pages for consistent output:

import { zerox } from 'zerox-ocr';

const result = await zerox({
  filePath: 'report.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  maintainFormat: true  // Preserve formatting across pages
});

maintainFormat

boolean

default:false

When true, Zerox processes pages sequentially and provides context from the previous page to maintain consistent formatting (headers, tables, lists, etc.). This increases accuracy but reduces concurrency.Important: Cannot be used with extractOnly mode.

With maintainFormat: true, pages are processed sequentially rather than in parallel. This ensures formatting consistency but increases processing time.

How It Works

When enabled, each page receives context from the previous page:

// Internal behavior (simplified)
let priorPage = "";

for (const page of pages) {
  const result = await processPage(page, {
    prompt: `Maintain consistent formatting with: ${priorPage}`
  });
  priorPage = result.content;
}

Image Processing

Orientation Correction

Automatically detect and correct image orientation:

const result = await zerox({
  filePath: 'scanned-document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  correctOrientation: true  // Default: true
});

correctOrientation

boolean

default:true

Uses Tesseract OCR to detect image orientation and rotates images as needed. Improves accuracy for scanned or rotated documents.

The system uses Tesseract workers to analyze each image:

// Determines optimal rotation (0°, 90°, 180°, 270°)
const rotation = await determineOptimalRotation(image);
console.log(`Rotating image ${rotation} degrees`);

Edge Trimming

Remove whitespace and borders from images:

const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  trimEdges: true  // Default: true
});

trimEdges

boolean

default:true

Automatically trims whitespace around the content. Reduces token usage and improves focus on actual content.

Image Conversion Options

Image Format

Choose the format for converted images:

const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  imageFormat: 'jpeg'  // 'png' (default) or 'jpeg'
});

imageFormat

string

default:"png"

Format for converted images:

png: Higher quality, larger file size
jpeg: Smaller file size, slight quality loss

Image Density

Control the resolution of PDF-to-image conversion:

const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  imageDensity: 300  // DPI (dots per inch)
});

imageDensity

number

DPI for PDF-to-image conversion. Higher values produce clearer images but larger file sizes.

150 DPI: Low quality, faster processing
300 DPI: Standard quality (recommended)
600 DPI: High quality, slower processing

Image Height

Specify a fixed height for converted images:

const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  imageHeight: 2048  // Height in pixels
});

imageHeight

number

Fixed height for images in pixels. Width is adjusted to maintain aspect ratio. Useful for standardizing image sizes.

Image Compression

Automatically compress images to reduce size:

const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  maxImageSize: 10  // Maximum size in MB
});

maxImageSize

number

default:15

Maximum image size in MB. Images larger than this are automatically compressed. Set to 0 to disable compression.

The compression algorithm:

// Starts at 90% quality, reduces by 10% until target size is reached
let quality = 90;
while (imageSize > maxSize && quality >= 20) {
  compressedImage = await compress(image, quality);
  quality -= 10;
}

Page Selection

Process specific pages instead of the entire document:

// Process single page
const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  pagesToConvertAsImages: 5  // Only page 5
});

// Process multiple pages
const result2 = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  pagesToConvertAsImages: [1, 3, 5, 7]  // Specific pages
});

// Process all pages (default)
const result3 = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  pagesToConvertAsImages: -1  // All pages (default)
});

pagesToConvertAsImages

number | number[]

Pages to process:

-1: Process all pages (default)
number: Process single page (1-indexed)
number[]: Process specific pages

Invalid page numbers are filtered out automatically.

Concurrency Control

Control parallel processing of pages:

const result = await zerox({
  filePath: 'large-document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  concurrency: 5  // Process 5 pages at a time
});

concurrency

number

default:10

Maximum number of pages to process concurrently. Higher values increase speed but also API rate limit risk.Note: Ignored when maintainFormat: true (sequential processing).

See Performance Tuning for optimization strategies.

Tesseract Workers

Configure Tesseract OCR worker pool:

const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  correctOrientation: true,
  maxTesseractWorkers: 8  // Maximum worker threads
});

maxTesseractWorkers

number

Maximum number of Tesseract worker threads for orientation detection:

-1: Unlimited (default)
number: Limit worker count

Only relevant when correctOrientation: true.

By default, Zerox creates 3 initial workers and scales up as needed:

// From constants.ts
const NUM_STARTING_WORKERS = 3;

// Workers scale based on document size
if (numPages > NUM_STARTING_WORKERS) {
  await addMoreWorkers(numPages - NUM_STARTING_WORKERS);
}

Custom Prompts

Provide custom instructions for OCR:

const result = await zerox({
  filePath: 'technical-doc.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  prompt: `
    Additional instructions:
    - Preserve all code blocks with syntax highlighting
    - Convert diagrams to Mermaid syntax when possible
    - Keep all mathematical equations in LaTeX format
  `
});

prompt

string

Custom instructions appended to the system prompt. Use to provide domain-specific guidance.

The base system prompt is defined in constants.ts:

const SYSTEM_PROMPT_BASE = `
Convert the following document to markdown.
Return only the markdown with no explanation text.

RULES:
  - Include all information on the page
  - Return tables in HTML format
  - Interpret charts & infographics to markdown
  - Use ☐ and ☑ for checkboxes
  - Wrap logos: <logo>Brand Name</logo>
  - Wrap watermarks: <watermark>TEXT</watermark>
  - Wrap page numbers: <page_number>14</page_number>
`;

LLM Parameters

Fine-tune model behavior:

const result = await zerox({
  filePath: 'document.pdf',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  llmParams: {
    temperature: 0.1,     // Lower = more deterministic
    maxTokens: 4096,      // Maximum output tokens
    topP: 0.9,           // Nucleus sampling
    frequencyPenalty: 0,  // Penalize frequent tokens
    presencePenalty: 0,   // Penalize repeated topics
    logprobs: true        // Return log probabilities
  }
});

llmParams

object

Model parameters. Available options depend on the provider:OpenAI/Azure:

temperature (0-2): Randomness (default: 0.7)
maxTokens: Maximum output tokens
topP (0-1): Nucleus sampling
frequencyPenalty (-2 to 2): Penalize frequent tokens
presencePenalty (-2 to 2): Penalize repeated content
logprobs (boolean): Return log probabilities

Bedrock:

Similar to OpenAI, but no logprobs

Google:

Uses maxOutputTokens instead of maxTokens

Custom Model Function

Provide a custom function for OCR processing:

import { CompletionResponse } from 'zerox-ocr';

const customModel = async ({
  buffers,
  image,
  maintainFormat,
  pageNumber,
  priorPage
}) => {
  // Your custom vision model logic
  const result = await myVisionModel(buffers[0]);
  
  return {
    content: result.text,
    inputTokens: result.inputTokens,
    outputTokens: result.outputTokens
  } as CompletionResponse;
};

const result = await zerox({
  filePath: 'document.pdf',
  customModelFunction: customModel
});

customModelFunction

function

Custom function for OCR processing. Receives:

buffers: Array of image buffers (may be split for tall images)
image: Path to the image file
maintainFormat: Whether to maintain formatting
pageNumber: Current page number
priorPage: Content from previous page (if maintainFormat: true)

Must return a CompletionResponse object.

Complete Advanced Example

import { zerox } from 'zerox-ocr';

const result = await zerox({
  // Input
  filePath: 'technical-report.pdf',
  pagesToConvertAsImages: [1, 2, 3, 4, 5],
  
  // Model configuration
  model: 'gpt-4o',
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  llmParams: {
    temperature: 0.1,
    maxTokens: 4096,
    logprobs: true
  },
  
  // Image processing
  correctOrientation: true,
  trimEdges: true,
  imageFormat: 'png',
  imageDensity: 300,
  maxImageSize: 10,
  
  // OCR behavior
  maintainFormat: true,
  prompt: 'Preserve code blocks and technical diagrams',
  
  // Performance
  concurrency: 5,
  maxTesseractWorkers: 8,
  
  // Output
  outputDir: './output',
  tempDir: './temp',
  cleanup: true
});

console.log(`Processed ${result.pages.length} pages`);
console.log(`Total tokens: ${result.inputTokens + result.outputTokens}`);

Next Steps

Error Handling - Configure retry and error strategies
Performance Tuning - Optimize processing speed
Data Extraction - Extract structured data

Get Started

Installation

Core Concepts

Guides

Advanced Options

Overview

Format Preservation

Maintain Format

How It Works

Image Processing

Orientation Correction

Edge Trimming

Image Conversion Options

Image Format

Image Density

Image Height

Image Compression

Page Selection

Concurrency Control

Tesseract Workers

Custom Prompts

LLM Parameters

Custom Model Function

Complete Advanced Example

Next Steps

Build docs developers (and LLMs) love

Get Started

Installation

Core Concepts

Guides

​Overview

​Format Preservation

​Maintain Format

​How It Works

​Image Processing

​Orientation Correction

​Edge Trimming

​Image Conversion Options

​Image Format

​Image Density

​Image Height

​Image Compression

​Page Selection

​Concurrency Control

​Tesseract Workers

​Custom Prompts

​LLM Parameters

​Custom Model Function

​Complete Advanced Example

​Next Steps

Build docs developers (and LLMs) love

Overview

Format Preservation

Maintain Format

How It Works

Image Processing

Orientation Correction

Edge Trimming

Image Conversion Options

Image Format

Image Density

Image Height

Image Compression

Page Selection

Concurrency Control

Tesseract Workers

Custom Prompts

LLM Parameters

Custom Model Function

Complete Advanced Example

Next Steps