Overview
Google Gemini provides powerful multimodal AI models through the Google AI Studio API. Zerox supports all Gemini 1.5 and 2.0 vision models for document processing and data extraction.Credentials
To use Google Gemini models, you need an API key:Your Google AI Studio API key. Can be obtained from Google AI Studio.
Environment Variable
Supported Models
The following Gemini models are available through Zerox:| Model | Model ID | Description |
|---|---|---|
| Gemini 2.5 Pro | gemini-2.5-pro-preview-03-25 | Latest Gemini 2.5 model (preview) |
| Gemini 2.0 Flash | gemini-2.0-flash-001 | Fast multimodal model |
| Gemini 2.0 Flash Lite | gemini-2.0-flash-lite-preview-02-05 | Lightweight Flash variant (preview) |
| Gemini 1.5 Pro | gemini-1.5-pro | Most capable 1.5 model |
| Gemini 1.5 Flash | gemini-1.5-flash | Fast and efficient |
| Gemini 1.5 Flash-8B | gemini-1.5-flash-8b | Smallest, fastest model |
Configuration
Basic Example
With Environment Variable
LLM Parameters
Google Gemini models support the following optional parameters:Controls randomness in the output. Values range from 0 to 2. Lower values make output more focused and deterministic.
Maximum number of tokens to generate in the completion. Note: Gemini uses
maxOutputTokens instead of maxTokens.Nucleus sampling parameter. Values range from 0 to 1. Only
temperature or topP should be modified, not both.Not supported by Gemini models. This parameter is ignored.
Not supported by Gemini models. This parameter is ignored.
Example with Parameters
Gemini uses
maxOutputTokens instead of maxTokens. Zerox automatically handles this conversion when using Google models.Data Extraction
Gemini models support structured data extraction using JSON schemas:Gemini uses native JSON schema support with
responseMimeType: "application/json" for structured extraction. The output is automatically parsed and validated.Error Handling
Image Processing
Gemini models have specific requirements for image input:Image Format Support
Gemini supports the following image formats:- PNG
- JPEG
- WebP
- HEIC
- HEIF
Rate Limits
Google Gemini API has the following rate limits (as of the latest update):| Model | Requests per minute | Tokens per minute |
|---|---|---|
| Gemini 1.5 Pro | 10 | 4M input / 8K output |
| Gemini 1.5 Flash | 15 | 4M input / 8K output |
| Gemini 1.5 Flash-8B | 15 | 4M input / 8K output |
| Gemini 2.0 Flash | 10 | 4M input / 8K output |
concurrency parameter to stay within limits:
Context Window
Gemini models have large context windows:- Gemini 1.5 Pro: Up to 2 million tokens
- Gemini 1.5 Flash: Up to 1 million tokens
- Gemini 2.0 Flash: Up to 1 million tokens
Token Usage Tracking
Gemini’s token usage is reported through
usageMetadata.promptTokenCount (input) and usageMetadata.candidatesTokenCount (output). Zerox maps these to the standard inputTokens and outputTokens fields.Best Practices
- Use
gemini-1.5-flash-8bfor cost-effective processing of simple documents - Use
gemini-1.5-proorgemini-2.5-profor complex documents with tables, charts, and detailed layouts - Set
temperature: 0for deterministic output when consistency is critical - Leverage Gemini’s large context window for processing multi-page documents
- Monitor token usage to optimize costs
- Use
concurrencyparameter to manage rate limits effectively
Differences from Other Providers
Parameter Naming
- Gemini uses
maxOutputTokensinstead ofmaxTokens - Log probabilities are not supported by Gemini
Message Format
- Images must follow text in prompt construction (Zerox handles this automatically)
- System prompts are included in the user message content
Safety Settings
- Gemini has built-in safety filters that may block certain content
- If you encounter blocked responses, review Google’s safety settings documentation

