Supported Providers
Zerox integrates with four major AI providers:- OpenAI: GPT-4 Vision models (GPT-4o, GPT-4.1)
- Azure OpenAI: Enterprise deployment of OpenAI models
- AWS Bedrock: Anthropic Claude models via AWS
- Google AI: Gemini vision models
OpenAI
The default and most straightforward provider to configure.Available Models
gpt-4.1- Latest GPT-4 modelgpt-4.1-mini- Faster, more cost-effective optiongpt-4o- Optimized model (default)gpt-4o-mini- Compact version
Configuration
- Basic Setup
- With Custom Parameters
- Shorthand (Deprecated)
OpenAI-Specific Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
temperature | number | 1.0 | Randomness (0-2) |
maxTokens | number | 4096 | Maximum output tokens |
topP | number | 1.0 | Nucleus sampling |
frequencyPenalty | number | 0 | Reduce repetition (-2 to 2) |
presencePenalty | number | 0 | Encourage new topics (-2 to 2) |
logprobs | boolean | false | Return token probabilities |
OpenAI uses the
https://api.openai.com/v1/chat/completions endpoint with the Chat Completions API.Azure OpenAI
Enterprise-grade deployment with additional compliance and security features.Available Models
Azure deployments use the same model names as OpenAI:gpt-4.1gpt-4.1-minigpt-4o(recommended)gpt-4o-mini
Configuration
- Basic Setup
- With Environment Variables
- Full Example
Azure-Specific Details
API Version: Fixed at2024-10-21
Endpoint Format: https://{resource-name}.openai.azure.com
Deployment Name: The model parameter should match your Azure deployment name, not necessarily the underlying model name.
Azure uses the official
AzureOpenAI client from the OpenAI SDK. Authentication and API versioning are handled automatically.AWS Bedrock
Access Anthropic Claude models through AWS infrastructure.Available Models
Claude 3.5 (Latest):anthropic.claude-3-5-haiku-20241022-v1:0anthropic.claude-3-5-sonnet-20240620-v1:0anthropic.claude-3-5-sonnet-20241022-v2:0(recommended)
anthropic.claude-3-haiku-20240307-v1:0anthropic.claude-3-opus-20240229-v1:0anthropic.claude-3-sonnet-20240229-v1:0
Configuration
- With IAM Credentials
- With Default Credentials
- Full Example
Bedrock-Specific Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
temperature | number | 1.0 | Randomness (0-1) |
maxTokens | number | 4096 | Maximum output tokens |
topP | number | 1.0 | Nucleus sampling |
frequencyPenalty | number | 0 | Reduce repetition |
presencePenalty | number | 0 | Encourage new topics |
Bedrock uses the Anthropic Messages API format. The
anthropic_version is automatically set to bedrock-2023-05-31.Google Gemini
Google’s latest multimodal AI models with native vision capabilities.Available Models
Gemini 2 (Latest):gemini-2.5-pro-preview-03-25- Most capablegemini-2.0-flash-001- Fastest, recommendedgemini-2.0-flash-lite-preview-02-05- Ultra-fast, lightweight
gemini-1.5-flash- Fast and efficientgemini-1.5-flash-8b- Compact versiongemini-1.5-pro- High performance
Configuration
- Basic Setup
- With Custom Parameters
- Extraction Mode
Google-Specific Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
temperature | number | 1.0 | Randomness (0-2) |
maxOutputTokens | number | 4096 | Maximum output tokens |
topP | number | 1.0 | Nucleus sampling |
frequencyPenalty | number | 0 | Reduce repetition |
presencePenalty | number | 0 | Encourage new topics |
Google models use a different parameter name:
maxOutputTokens instead of maxTokens. Zerox handles this conversion automatically.Gemini-Specific Features
Image Positioning: For best results with Gemini, images are placed before text prompts in the message content. JSON Mode: Extraction mode automatically sets:responseMimeType: 'application/json'responseSchema: <your schema>
promptTokenCount and candidatesTokenCount from usage metadata.
Comparing Providers
Feature Support Matrix
| Feature | OpenAI | Azure | Bedrock | Gemini |
|---|---|---|---|---|
| OCR Mode | ✅ | ✅ | ✅ | ✅ |
| Extraction Mode | ✅ | ✅ | ✅ | ✅ |
| Hybrid Mode | ✅ | ✅ | ✅ | ✅ |
| Logprobs | ✅ | ✅ | ❌ | ❌ |
| JSON Schema | ✅ | ✅ | ✅ (via tools) | ✅ |
| Custom Prompts | ✅ | ✅ | ✅ | ✅ |
Performance Considerations
Speed:- Fastest: Gemini 2.0 Flash, GPT-4o-mini
- Balanced: Claude 3.5 Sonnet, GPT-4o
- Highest Quality: GPT-4.1, Claude 3.5 Sonnet v2
- Most economical: Gemini Flash 8B, GPT-4o-mini
- Mid-range: Claude 3.5 Haiku, Gemini 1.5 Flash
- Premium: GPT-4.1, Claude 3 Opus
- Largest: Gemini models (up to 1M tokens input)
- Standard: OpenAI and Claude (128K+ tokens)
Advanced Configuration
Separate OCR and Extraction Models
Use different providers for OCR and extraction:This allows you to optimize for cost and performance by using faster models for OCR and more accurate models for extraction.
Custom Model Function
Implement your own model integration:Best Practices
Provider Selection Guidelines:
- OpenAI: Best for general use, reliable, good documentation
- Azure: Enterprise requirements, compliance, SLAs
- Bedrock: AWS ecosystem integration, cost optimization
- Gemini: Large documents, cost-sensitive workloads
Troubleshooting
Common Issues
OpenAI: “Invalid API Key”- Verify key is active at platform.openai.com
- Check for extra whitespace in environment variable
- Ensure key has proper permissions
- Verify
modelmatches your deployment name exactly - Check endpoint URL format
- Confirm deployment is in same region as endpoint
- Enable Bedrock model access in AWS Console
- Verify IAM permissions include
bedrock:InvokeModel - Check region availability for specific models
- Confirm credentials are for correct AWS account
- Generate key at ai.google.dev
- Verify API is enabled in Google Cloud Console
- Check billing account is active

