OpenAI

Overview

OpenAI provides vision-capable models that can process documents and images. Zerox supports all OpenAI GPT-4 vision models for OCR and data extraction tasks.

Credentials

To use OpenAI models, you need to provide your API key:

credentials.apiKey

string

required

Your OpenAI API key. Can be obtained from the OpenAI API dashboard.

Environment Variable

You can store your API key as an environment variable:

export OPENAI_API_KEY="sk-..."

Supported Models

The following OpenAI models are available through Zerox:

Model	Model ID	Description
GPT-4.1	`gpt-4.1`	Latest GPT-4.1 model with vision capabilities
GPT-4.1 Mini	`gpt-4.1-mini`	Smaller, faster GPT-4.1 model
GPT-4o	`gpt-4o`	Optimized GPT-4 model with vision
GPT-4o Mini	`gpt-4o-mini`	Smaller, cost-effective GPT-4o model

Configuration

Basic Example

import { zerox } from "zerox";
import { ModelOptions, ModelProvider } from "zerox/node-zerox/dist/types";

const result = await zerox({
  filePath: "path/to/document.pdf",
  modelProvider: ModelProvider.OPENAI,
  model: ModelOptions.OPENAI_GPT_4O,
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

With Environment Variable

import { zerox } from "zerox";

const result = await zerox({
  filePath: "path/to/document.pdf",
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

When using environment variables, you can omit the modelProvider parameter. OpenAI is the default provider.

LLM Parameters

OpenAI models support the following optional parameters:

llmParams.temperature

number

default:"1"

Controls randomness in the output. Values range from 0 to 2. Lower values make output more focused and deterministic.

llmParams.maxTokens

number

default:"4096"

Maximum number of tokens to generate in the completion.

llmParams.topP

number

default:"1"

Nucleus sampling parameter. An alternative to temperature sampling. Values range from 0 to 1.

llmParams.frequencyPenalty

number

default:"0"

Number between -2.0 and 2.0. Positive values penalize tokens based on their frequency in the text so far.

llmParams.presencePenalty

number

default:"0"

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they appear in the text so far.

llmParams.logprobs

boolean

default:"false"

Whether to return log probabilities of the output tokens. Useful for confidence scoring.

Example with Parameters

import { zerox } from "zerox";
import { ModelOptions, ModelProvider } from "zerox/node-zerox/dist/types";

const result = await zerox({
  filePath: "path/to/document.pdf",
  modelProvider: ModelProvider.OPENAI,
  model: ModelOptions.OPENAI_GPT_4O_MINI,
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
  llmParams: {
    temperature: 0.3,
    maxTokens: 4096,
    topP: 0.9,
    logprobs: true,
  },
});

Data Extraction

OpenAI models support structured data extraction using JSON schemas:

import { zerox } from "zerox";
import { ModelOptions, ModelProvider } from "zerox/node-zerox/dist/types";

const schema = {
  type: "object",
  properties: {
    invoice_number: { type: "string" },
    date: { type: "string" },
    total: { type: "number" },
    items: {
      type: "array",
      items: {
        type: "object",
        properties: {
          description: { type: "string" },
          quantity: { type: "number" },
          price: { type: "number" },
        },
      },
    },
  },
};

const result = await zerox({
  filePath: "invoice.pdf",
  modelProvider: ModelProvider.OPENAI,
  model: ModelOptions.OPENAI_GPT_4O,
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
  extractOnly: true,
  schema: schema,
});

console.log(result.extracted);

Error Handling

import { zerox } from "zerox";
import { ErrorMode, ModelProvider } from "zerox/node-zerox/dist/types";

try {
  const result = await zerox({
    filePath: "document.pdf",
    modelProvider: ModelProvider.OPENAI,
    credentials: {
      apiKey: process.env.OPENAI_API_KEY,
    },
    errorMode: ErrorMode.THROW,
  });
} catch (error) {
  console.error("OCR failed:", error);
}

Ensure your OpenAI API key has sufficient credits and appropriate permissions for vision API access. Vision models consume tokens based on image resolution and document complexity.

Best Practices

Use gpt-4o-mini for cost-effective processing of simple documents
Use gpt-4o or gpt-4.1 for complex layouts, tables, and detailed extraction
Set temperature: 0 for deterministic output when consistency is critical
Enable logprobs: true to access confidence scores for extraction quality assessment
Monitor token usage through the returned inputTokens and outputTokens fields

Node.js SDK

Python SDK

Model Providers

Overview

Credentials

Environment Variable

Supported Models

Configuration

Basic Example

With Environment Variable

LLM Parameters

Example with Parameters

Data Extraction

Error Handling

Best Practices

Build docs developers (and LLMs) love

Node.js SDK

Python SDK

Model Providers

​Overview

​Credentials

​Environment Variable

​Supported Models

​Configuration

​Basic Example

​With Environment Variable

​LLM Parameters

​Example with Parameters

​Data Extraction

​Error Handling

​Best Practices

Build docs developers (and LLMs) love

Overview

Credentials

Environment Variable

Supported Models

Configuration

Basic Example

With Environment Variable

LLM Parameters

Example with Parameters

Data Extraction

Error Handling

Best Practices