Skip to main content

ZeroxArgs Interface

All parameters for the zerox() function.

Required Parameters

filePath
string
required
Path to the file to process. Can be a local file path or a URL to a remote file.Supported formats:
  • PDF (.pdf)
  • Images (.png, .jpg, .jpeg, .heic)
  • Structured data files (.xlsx, .xls, .csv)
  • Other document formats that can be converted to PDF

Authentication

credentials
ModelCredentials
Authentication credentials for the model provider. Required unless using openaiAPIKey.Default: { apiKey: "" }
openaiAPIKey
string
Deprecated: Use credentials instead. OpenAI API key for authentication.When provided, automatically sets modelProvider to ModelProvider.OPENAI and credentials to { apiKey: openaiAPIKey }.Default: ""

Model Configuration

model
ModelOptions | string
The model to use for OCR processing.Available options:
  • ModelOptions.OPENAI_GPT_4_1 - "gpt-4.1"
  • ModelOptions.OPENAI_GPT_4_1_MINI - "gpt-4.1-mini"
  • ModelOptions.OPENAI_GPT_4O - "gpt-4o"
  • ModelOptions.OPENAI_GPT_4O_MINI - "gpt-4o-mini"
  • ModelOptions.BEDROCK_CLAUDE_3_HAIKU_2024_10
  • ModelOptions.BEDROCK_CLAUDE_3_SONNET_2024_10
  • ModelOptions.GOOGLE_GEMINI_2_5_PRO
  • ModelOptions.GOOGLE_GEMINI_2_FLASH
  • And more (see Types)
Default: ModelOptions.OPENAI_GPT_4O
modelProvider
ModelProvider | string
The model provider to use.Options:
  • ModelProvider.OPENAI
  • ModelProvider.AZURE
  • ModelProvider.BEDROCK
  • ModelProvider.GOOGLE
Default: ModelProvider.OPENAI
llmParams
Partial<LLMParams>
Additional parameters for the language model.Default: {}
customModelFunction
function
Custom function to handle model completion. Useful for implementing custom model providers or adding preprocessing logic.
(params: {
  buffers: Buffer[];
  image: string;
  maintainFormat: boolean;
  pageNumber: number;
  priorPage: string;
}) => Promise<CompletionResponse>

Extraction Configuration

schema
Record<string, unknown>
JSON Schema for structured data extraction. When provided, Zerox will extract structured data matching this schema.
{
  type: 'object',
  properties: {
    fieldName: { type: 'string' },
    amount: { type: 'number' }
  }
}
extractOnly
boolean
When true, skips OCR and only performs structured data extraction. Requires schema to be provided.This mode automatically enables directImageExtraction.Default: false
extractPerPage
string[]
List of schema property names to extract on a per-page basis rather than from the full document.
extractPerPage: ['pageNumber', 'section']
enableHybridExtraction
boolean
When true, uses both OCR text and images for extraction, improving accuracy.Requirements:
  • Requires schema to be provided
  • Cannot be used with directImageExtraction or extractOnly
Default: false
directImageExtraction
boolean
When true, performs extraction directly from images without OCR text.Default: false
extractionModel
ModelOptions | string
Model to use for extraction. If not specified, uses the same model as OCR.
extractionModelProvider
ModelProvider | string
Model provider for extraction. If not specified, uses the same provider as OCR.
extractionCredentials
ModelCredentials
Credentials for the extraction model. If not specified, uses the same credentials as OCR.
extractionLlmParams
Partial<LLMParams>
LLM parameters for extraction. If not specified, uses the same parameters as OCR.
extractionPrompt
string
Custom prompt for the extraction process.

OCR Configuration

prompt
string
Custom prompt to guide the OCR process. Use this to provide specific instructions about how to extract or format the text.
maintainFormat
boolean
When true, processes pages sequentially to maintain formatting consistency across pages.Requirements:
  • Cannot be used with extractOnly mode
  • Disables concurrent processing
Default: false
correctOrientation
boolean
When true, automatically detects and corrects page orientation using Tesseract OCR.Default: true
trimEdges
boolean
When true, automatically trims white edges from images before processing.Default: true

Image Processing

imageDensity
number
DPI (dots per inch) for PDF to image conversion. Higher values produce better quality but larger files.Typical values: 150-300
imageHeight
number
Target height in pixels for converted images. Width is adjusted proportionally.
imageFormat
'png' | 'jpeg'
Format for converted images.Default: "png"
maxImageSize
number
Maximum image size in megabytes. Images larger than this will be compressed.Default: 15
pagesToConvertAsImages
number | number[]
Specify which pages to process:
  • -1: Process all pages (default)
  • number: Process a single page
  • number[]: Process specific pages (e.g., [1, 3, 5])
Default: -1

Performance & Reliability

concurrency
number
Maximum number of pages to process concurrently. Higher values increase speed but use more resources.Default: 10
maxRetries
number
Maximum number of retry attempts for failed operations.Default: 1
errorMode
ErrorMode
How to handle errors during processing:
  • ErrorMode.IGNORE: Continue processing remaining pages (default)
  • ErrorMode.THROW: Throw an error immediately on failure
Default: ErrorMode.IGNORE
maxTesseractWorkers
number
Maximum number of Tesseract OCR workers for orientation correction.
  • -1: Unlimited (default)
  • number: Limit to specific count
Default: -1

File Management

outputDir
string
Directory to save the aggregated markdown output. If not specified, output is only returned in the response.
tempDir
string
Directory for temporary files during processing.Default: os.tmpdir()
cleanup
boolean
When true, automatically removes temporary files after processing.Default: true

Example with Multiple Parameters

import { zerox, ModelProvider, ErrorMode } from 'zerox';

const result = await zerox({
  filePath: './document.pdf',
  credentials: {
    apiKey: process.env.OPENAI_API_KEY
  },
  model: 'gpt-4o',
  modelProvider: ModelProvider.OPENAI,
  llmParams: {
    temperature: 0.1,
    maxTokens: 4000
  },
  concurrency: 5,
  maxRetries: 3,
  errorMode: ErrorMode.THROW,
  correctOrientation: true,
  trimEdges: true,
  maintainFormat: true,
  maxImageSize: 10,
  outputDir: './output',
  cleanup: true,
  schema: {
    type: 'object',
    properties: {
      title: { type: 'string' },
      sections: {
        type: 'array',
        items: { type: 'string' }
      }
    }
  }
});

Build docs developers (and LLMs) love