Parameters

ZeroxArgs Interface

All parameters for the zerox() function.

Required Parameters

filePath

string

required

Path to the file to process. Can be a local file path or a URL to a remote file.Supported formats:

PDF (.pdf)
Images (.png, .jpg, .jpeg, .heic)
Structured data files (.xlsx, .xls, .csv)
Other document formats that can be converted to PDF

Authentication

credentials

ModelCredentials

Authentication credentials for the model provider. Required unless using openaiAPIKey.

Show ModelCredentials Types

OpenAICredentials

{
  apiKey: string;
}

AzureCredentials

{
  apiKey: string;
  endpoint: string;
}

BedrockCredentials

{
  region: string;
  accessKeyId?: string;
  secretAccessKey?: string;
  sessionToken?: string;
}

GoogleCredentials

{
  apiKey: string;
}

Default: { apiKey: "" }

openaiAPIKey

string

Deprecated: Use credentials instead. OpenAI API key for authentication.When provided, automatically sets modelProvider to ModelProvider.OPENAI and credentials to { apiKey: openaiAPIKey }.Default: ""

Model Configuration

model

ModelOptions | string

The model to use for OCR processing.Available options:

ModelOptions.OPENAI_GPT_4_1 - "gpt-4.1"
ModelOptions.OPENAI_GPT_4_1_MINI - "gpt-4.1-mini"
ModelOptions.OPENAI_GPT_4O - "gpt-4o"
ModelOptions.OPENAI_GPT_4O_MINI - "gpt-4o-mini"
ModelOptions.BEDROCK_CLAUDE_3_HAIKU_2024_10
ModelOptions.BEDROCK_CLAUDE_3_SONNET_2024_10
ModelOptions.GOOGLE_GEMINI_2_5_PRO
ModelOptions.GOOGLE_GEMINI_2_FLASH
And more (see Types)

Default: ModelOptions.OPENAI_GPT_4O

modelProvider

ModelProvider | string

The model provider to use.Options:

ModelProvider.OPENAI
ModelProvider.AZURE
ModelProvider.BEDROCK
ModelProvider.GOOGLE

Default: ModelProvider.OPENAI

llmParams

Partial<LLMParams>

Additional parameters for the language model.

Show LLMParams Properties

Common Parameters (all providers)

temperature?: number - Controls randomness (0-1)
topP?: number - Nucleus sampling parameter
frequencyPenalty?: number - Penalizes frequent tokens
presencePenalty?: number - Penalizes tokens based on presence

OpenAI/Azure Specific

maxTokens?: number - Maximum tokens to generate
logprobs?: boolean - Return log probabilities

Bedrock Specific

maxTokens?: number - Maximum tokens to generate

Google Specific

maxOutputTokens?: number - Maximum output tokens

Default: {}

customModelFunction

function

Custom function to handle model completion. Useful for implementing custom model providers or adding preprocessing logic.

(params: {
  buffers: Buffer[];
  image: string;
  maintainFormat: boolean;
  pageNumber: number;
  priorPage: string;
}) => Promise<CompletionResponse>

Extraction Configuration

schema

Record<string, unknown>

JSON Schema for structured data extraction. When provided, Zerox will extract structured data matching this schema.

{
  type: 'object',
  properties: {
    fieldName: { type: 'string' },
    amount: { type: 'number' }
  }
}

extractOnly

boolean

When true, skips OCR and only performs structured data extraction. Requires schema to be provided.This mode automatically enables directImageExtraction.Default: false

extractPerPage

string[]

List of schema property names to extract on a per-page basis rather than from the full document.

extractPerPage: ['pageNumber', 'section']

enableHybridExtraction

boolean

When true, uses both OCR text and images for extraction, improving accuracy.Requirements:

Requires schema to be provided
Cannot be used with directImageExtraction or extractOnly

Default: false

directImageExtraction

boolean

When true, performs extraction directly from images without OCR text.Default: false

extractionModel

ModelOptions | string

Model to use for extraction. If not specified, uses the same model as OCR.

extractionModelProvider

ModelProvider | string

Model provider for extraction. If not specified, uses the same provider as OCR.

extractionCredentials

ModelCredentials

Credentials for the extraction model. If not specified, uses the same credentials as OCR.

extractionLlmParams

Partial<LLMParams>

LLM parameters for extraction. If not specified, uses the same parameters as OCR.

extractionPrompt

string

Custom prompt for the extraction process.

OCR Configuration

prompt

string

Custom prompt to guide the OCR process. Use this to provide specific instructions about how to extract or format the text.

maintainFormat

boolean

When true, processes pages sequentially to maintain formatting consistency across pages.Requirements:

Cannot be used with extractOnly mode
Disables concurrent processing

Default: false

correctOrientation

boolean

When true, automatically detects and corrects page orientation using Tesseract OCR.Default: true

trimEdges

boolean

When true, automatically trims white edges from images before processing.Default: true

Image Processing

imageDensity

number

DPI (dots per inch) for PDF to image conversion. Higher values produce better quality but larger files.Typical values: 150-300

imageHeight

number

Target height in pixels for converted images. Width is adjusted proportionally.

imageFormat

'png' | 'jpeg'

Format for converted images.Default: "png"

maxImageSize

number

Maximum image size in megabytes. Images larger than this will be compressed.Default: 15

pagesToConvertAsImages

number | number[]

Specify which pages to process:

-1: Process all pages (default)
number: Process a single page
number[]: Process specific pages (e.g., [1, 3, 5])

Default: -1

Performance & Reliability

concurrency

number

Maximum number of pages to process concurrently. Higher values increase speed but use more resources.Default: 10

maxRetries

number

Maximum number of retry attempts for failed operations.Default: 1

errorMode

ErrorMode

How to handle errors during processing:

ErrorMode.IGNORE: Continue processing remaining pages (default)
ErrorMode.THROW: Throw an error immediately on failure

Default: ErrorMode.IGNORE

maxTesseractWorkers

number

Maximum number of Tesseract OCR workers for orientation correction.

-1: Unlimited (default)
number: Limit to specific count

Default: -1

File Management

outputDir

string

Directory to save the aggregated markdown output. If not specified, output is only returned in the response.

tempDir

string

Directory for temporary files during processing.Default: os.tmpdir()

cleanup

boolean

When true, automatically removes temporary files after processing.Default: true

Example with Multiple Parameters

import { zerox, ModelProvider, ErrorMode } from 'zerox';

const result = await zerox({
  filePath: './document.pdf',
  credentials: {
    apiKey: process.env.OPENAI_API_KEY
  },
  model: 'gpt-4o',
  modelProvider: ModelProvider.OPENAI,
  llmParams: {
    temperature: 0.1,
    maxTokens: 4000
  },
  concurrency: 5,
  maxRetries: 3,
  errorMode: ErrorMode.THROW,
  correctOrientation: true,
  trimEdges: true,
  maintainFormat: true,
  maxImageSize: 10,
  outputDir: './output',
  cleanup: true,
  schema: {
    type: 'object',
    properties: {
      title: { type: 'string' },
      sections: {
        type: 'array',
        items: { type: 'string' }
      }
    }
  }
});

Node.js SDK

Python SDK

Model Providers

ZeroxArgs Interface

Required Parameters

Authentication

Model Configuration

Extraction Configuration

OCR Configuration

Image Processing

Performance & Reliability

File Management

Example with Multiple Parameters

Build docs developers (and LLMs) love

Node.js SDK

Python SDK

Model Providers

​ZeroxArgs Interface

​Required Parameters

​Authentication

​Model Configuration

​Extraction Configuration

​OCR Configuration

​Image Processing

​Performance & Reliability

​File Management

​Example with Multiple Parameters

Build docs developers (and LLMs) love

ZeroxArgs Interface

Required Parameters

Authentication

Model Configuration

Extraction Configuration

OCR Configuration

Image Processing

Performance & Reliability

File Management

Example with Multiple Parameters