Skip to main content
Zerox supports multiple vision model providers, each with specific configuration requirements and capabilities.

Supported Providers

Zerox integrates with four major AI providers:
  • OpenAI: GPT-4 Vision models (GPT-4o, GPT-4.1)
  • Azure OpenAI: Enterprise deployment of OpenAI models
  • AWS Bedrock: Anthropic Claude models via AWS
  • Google AI: Gemini vision models

OpenAI

The default and most straightforward provider to configure.

Available Models

  • gpt-4.1 - Latest GPT-4 model
  • gpt-4.1-mini - Faster, more cost-effective option
  • gpt-4o - Optimized model (default)
  • gpt-4o-mini - Compact version

Configuration

import { zerox } from 'zerox';
import { ModelProvider, ModelOptions } from 'zerox';

const result = await zerox({
  filePath: 'document.pdf',
  modelProvider: ModelProvider.OPENAI,
  credentials: {
    apiKey: process.env.OPENAI_API_KEY || '',
  },
  model: ModelOptions.OPENAI_GPT_4O,
});

OpenAI-Specific Parameters

ParameterTypeDefaultDescription
temperaturenumber1.0Randomness (0-2)
maxTokensnumber4096Maximum output tokens
topPnumber1.0Nucleus sampling
frequencyPenaltynumber0Reduce repetition (-2 to 2)
presencePenaltynumber0Encourage new topics (-2 to 2)
logprobsbooleanfalseReturn token probabilities
OpenAI uses the https://api.openai.com/v1/chat/completions endpoint with the Chat Completions API.

Azure OpenAI

Enterprise-grade deployment with additional compliance and security features.

Available Models

Azure deployments use the same model names as OpenAI:
  • gpt-4.1
  • gpt-4.1-mini
  • gpt-4o (recommended)
  • gpt-4o-mini

Configuration

import { zerox } from 'zerox';
import { ModelProvider, ModelOptions } from 'zerox';

const result = await zerox({
  filePath: 'document.pdf',
  modelProvider: ModelProvider.AZURE,
  credentials: {
    apiKey: process.env.AZURE_API_KEY || '',
    endpoint: process.env.AZURE_ENDPOINT || '',
  },
  model: ModelOptions.OPENAI_GPT_4O, // Your deployment name
});

Azure-Specific Details

API Version: Fixed at 2024-10-21 Endpoint Format: https://{resource-name}.openai.azure.com Deployment Name: The model parameter should match your Azure deployment name, not necessarily the underlying model name.
Azure uses the official AzureOpenAI client from the OpenAI SDK. Authentication and API versioning are handled automatically.

AWS Bedrock

Access Anthropic Claude models through AWS infrastructure.

Available Models

Claude 3.5 (Latest):
  • anthropic.claude-3-5-haiku-20241022-v1:0
  • anthropic.claude-3-5-sonnet-20240620-v1:0
  • anthropic.claude-3-5-sonnet-20241022-v2:0 (recommended)
Claude 3:
  • anthropic.claude-3-haiku-20240307-v1:0
  • anthropic.claude-3-opus-20240229-v1:0
  • anthropic.claude-3-sonnet-20240229-v1:0

Configuration

import { zerox } from 'zerox';
import { ModelProvider, ModelOptions } from 'zerox';

const result = await zerox({
  filePath: 'document.pdf',
  modelProvider: ModelProvider.BEDROCK,
  credentials: {
    region: process.env.AWS_REGION || 'us-east-1',
    accessKeyId: process.env.AWS_ACCESS_KEY_ID,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
    sessionToken: process.env.AWS_SESSION_TOKEN, // Optional
  },
  model: ModelOptions.BEDROCK_CLAUDE_3_SONNET_2024_10,
});

Bedrock-Specific Parameters

ParameterTypeDefaultDescription
temperaturenumber1.0Randomness (0-1)
maxTokensnumber4096Maximum output tokens
topPnumber1.0Nucleus sampling
frequencyPenaltynumber0Reduce repetition
presencePenaltynumber0Encourage new topics
Bedrock uses the Anthropic Messages API format. The anthropic_version is automatically set to bedrock-2023-05-31.
Bedrock does not support logprobs. The parameter is ignored if provided.

Google Gemini

Google’s latest multimodal AI models with native vision capabilities.

Available Models

Gemini 2 (Latest):
  • gemini-2.5-pro-preview-03-25 - Most capable
  • gemini-2.0-flash-001 - Fastest, recommended
  • gemini-2.0-flash-lite-preview-02-05 - Ultra-fast, lightweight
Gemini 1.5:
  • gemini-1.5-flash - Fast and efficient
  • gemini-1.5-flash-8b - Compact version
  • gemini-1.5-pro - High performance

Configuration

import { zerox } from 'zerox';
import { ModelProvider, ModelOptions } from 'zerox';

const result = await zerox({
  filePath: 'document.pdf',
  modelProvider: ModelProvider.GOOGLE,
  credentials: {
    apiKey: process.env.GEMINI_API_KEY || '',
  },
  model: ModelOptions.GOOGLE_GEMINI_2_FLASH,
});

Google-Specific Parameters

ParameterTypeDefaultDescription
temperaturenumber1.0Randomness (0-2)
maxOutputTokensnumber4096Maximum output tokens
topPnumber1.0Nucleus sampling
frequencyPenaltynumber0Reduce repetition
presencePenaltynumber0Encourage new topics
Google models use a different parameter name: maxOutputTokens instead of maxTokens. Zerox handles this conversion automatically.

Gemini-Specific Features

Image Positioning: For best results with Gemini, images are placed before text prompts in the message content. JSON Mode: Extraction mode automatically sets:
  • responseMimeType: 'application/json'
  • responseSchema: <your schema>
Token Counting: Uses promptTokenCount and candidatesTokenCount from usage metadata.
Gemini does not support logprobs. The parameter is ignored if provided.

Comparing Providers

Feature Support Matrix

FeatureOpenAIAzureBedrockGemini
OCR Mode
Extraction Mode
Hybrid Mode
Logprobs
JSON Schema✅ (via tools)
Custom Prompts

Performance Considerations

Speed:
  • Fastest: Gemini 2.0 Flash, GPT-4o-mini
  • Balanced: Claude 3.5 Sonnet, GPT-4o
  • Highest Quality: GPT-4.1, Claude 3.5 Sonnet v2
Cost:
  • Most economical: Gemini Flash 8B, GPT-4o-mini
  • Mid-range: Claude 3.5 Haiku, Gemini 1.5 Flash
  • Premium: GPT-4.1, Claude 3 Opus
Context Length:
  • Largest: Gemini models (up to 1M tokens input)
  • Standard: OpenAI and Claude (128K+ tokens)

Advanced Configuration

Separate OCR and Extraction Models

Use different providers for OCR and extraction:
const result = await zerox({
  filePath: 'document.pdf',
  
  // OCR with OpenAI
  modelProvider: ModelProvider.OPENAI,
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  model: ModelOptions.OPENAI_GPT_4O,
  llmParams: { temperature: 0 },
  
  // Extraction with Claude
  extractionModelProvider: ModelProvider.BEDROCK,
  extractionCredentials: {
    region: 'us-east-1',
    accessKeyId: process.env.AWS_ACCESS_KEY_ID,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
  },
  extractionModel: ModelOptions.BEDROCK_CLAUDE_3_SONNET_2024_10,
  extractionLlmParams: { temperature: 0, maxTokens: 8192 },
  
  schema: { /* ... */ },
});
This allows you to optimize for cost and performance by using faster models for OCR and more accurate models for extraction.

Custom Model Function

Implement your own model integration:
import { CompletionResponse } from 'zerox';

const customModel = async ({
  buffers,
  image,
  maintainFormat,
  pageNumber,
  priorPage,
}: {
  buffers: Buffer[];
  image: string;
  maintainFormat: boolean;
  pageNumber: number;
  priorPage: string;
}): Promise<CompletionResponse> => {
  // Your custom vision model API call
  const response = await yourCustomAPI(buffers[0]);
  
  return {
    content: response.text,
    inputTokens: response.usage.input,
    outputTokens: response.usage.output,
  };
};

const result = await zerox({
  filePath: 'document.pdf',
  customModelFunction: customModel,
  // credentials not required with custom function
});

Best Practices

Provider Selection Guidelines:
  1. OpenAI: Best for general use, reliable, good documentation
  2. Azure: Enterprise requirements, compliance, SLAs
  3. Bedrock: AWS ecosystem integration, cost optimization
  4. Gemini: Large documents, cost-sensitive workloads
Security Considerations:
  • Never commit API keys to version control
  • Use environment variables or secret management services
  • Rotate keys regularly
  • Monitor usage and set billing alerts
  • Use least-privilege IAM policies (Bedrock)

Troubleshooting

Common Issues

OpenAI: “Invalid API Key”
  • Verify key is active at platform.openai.com
  • Check for extra whitespace in environment variable
  • Ensure key has proper permissions
Azure: “Deployment not found”
  • Verify model matches your deployment name exactly
  • Check endpoint URL format
  • Confirm deployment is in same region as endpoint
Bedrock: “Access Denied”
  • Enable Bedrock model access in AWS Console
  • Verify IAM permissions include bedrock:InvokeModel
  • Check region availability for specific models
  • Confirm credentials are for correct AWS account
Gemini: “API Key not valid”
  • Generate key at ai.google.dev
  • Verify API is enabled in Google Cloud Console
  • Check billing account is active

Error Handling

Wrap provider calls with try-catch:
try {
  const result = await zerox({
    filePath: 'document.pdf',
    // ... provider config
  });
} catch (error) {
  if (error.message.includes('Invalid API Key')) {
    console.error('Check your credentials');
  } else if (error.message.includes('rate limit')) {
    console.error('Rate limit exceeded, retry later');
  } else {
    console.error('Processing error:', error);
  }
}

Build docs developers (and LLMs) love