Google Gemini

Overview

Google Gemini is Google’s most capable AI model family, offering multimodal capabilities including text, vision, audio, and code. Access Gemini through Portkey for advanced reasoning, long context understanding, and function calling. Base URL: https://generativelanguage.googleapis.com

Supported Features

✅ Chat Completions (including streaming)
✅ Embeddings
✅ Function Calling
✅ Vision (Image and Video inputs)
✅ Audio Understanding
✅ Long Context (up to 2M tokens)
✅ JSON Mode
✅ System Instructions
❌ Image Generation (use Vertex AI)
❌ Fine-tuning (use Vertex AI)

Quick Start

Chat Completions

from portkey_ai import Portkey

client = Portkey(
    provider="google",
    api_key="***"  # Your Google AI Studio API key
)

response = client.chat.completions.create(
    model="gemini-2.0-flash-exp",
    messages=[
        {"role": "user", "content": "Explain how Gemini differs from other AI models"}
    ]
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="gemini-2.0-flash-exp",
    messages=[{"role": "user", "content": "Write a poem about AI"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Available Models

Gemini 2.0 (Latest)

Model	Context Window	Description	Best For
`gemini-2.0-flash-exp`	1M tokens	Latest experimental Gemini 2.0	General purpose, fast
`gemini-2.0-flash-thinking-exp`	32K tokens	Reasoning model (experimental)	Complex problem solving

Gemini 1.5

Model	Context Window	Description	Best For
`gemini-1.5-pro`	2M tokens	Most capable Gemini 1.5	Complex tasks, long context
`gemini-1.5-flash`	1M tokens	Fast, efficient model	High-throughput applications
`gemini-1.5-flash-8b`	1M tokens	Smallest, fastest	Cost-effective tasks

Embeddings

Model	Dimensions	Description
`text-embedding-004`	768	Latest embedding model
`text-multilingual-embedding-002`	768	Multilingual support

Gemini models excel at:

Long context understanding (up to 2M tokens)
Multimodal reasoning (text, images, video, audio)
Code generation and analysis
Multilingual capabilities

Configuration Options

Getting Your API Key

Go to Google AI Studio
Click Get API Key
Create or select a project
Copy your API key

client = Portkey(
    provider="google",
    api_key="AIza***"  # Your Google AI Studio API key
)

Advanced Features

Vision (Image Understanding)

response = client.chat.completions.create(
    model="gemini-2.0-flash-exp",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/image.jpg"
                }
            }
        ]
    }]
)

Base64 images:

import base64

with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="gemini-2.0-flash-exp",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}
            }
        ]
    }]
)

Function Calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gemini-2.0-flash-exp",
    messages=[{"role": "user", "content": "Search for the latest AI news"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

System Instructions

response = client.chat.completions.create(
    model="gemini-2.0-flash-exp",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful Python programming expert. Always provide working code examples."
        },
        {
            "role": "user",
            "content": "How do I read a JSON file?"
        }
    ]
)

Long Context Processing

Gemini excels at processing very long documents:

# Process a very long document (up to 2M tokens with Gemini 1.5 Pro)
long_document = """[Your very long document here - up to 2 million tokens]"""

response = client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=[
        {"role": "user", "content": f"Summarize this document:\n\n{long_document}"}
    ]
)

JSON Mode

response = client.chat.completions.create(
    model="gemini-2.0-flash-exp",
    messages=[{
        "role": "user",
        "content": "List 3 colors with their hex codes"
    }],
    response_format={"type": "json_object"}
)

import json
result = json.loads(response.choices[0].message.content)
print(result)

Embeddings

response = client.embeddings.create(
    model="text-embedding-004",
    input="Gemini is Google's most capable AI model"
)

embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}")

Batch embeddings:

response = client.embeddings.create(
    model="text-embedding-004",
    input=[
        "First document to embed",
        "Second document to embed",
        "Third document to embed"
    ]
)

for i, item in enumerate(response.data):
    print(f"Document {i}: {len(item.embedding)} dimensions")

Fallback Configuration

Fallback to GPT-4 if Gemini fails:

config = {
    "strategy": {"mode": "fallback"},
    "targets": [
        {
            "provider": "google",
            "api_key": "AIza***",
            "override_params": {"model": "gemini-2.0-flash-exp"}
        },
        {
            "provider": "openai",
            "api_key": "sk-***",
            "override_params": {"model": "gpt-4o"}
        }
    ]
}

client = Portkey().with_options(config=config)

Load Balancing

Balance between different Gemini models:

config = {
    "strategy": {"mode": "loadbalance"},
    "targets": [
        {
            "provider": "google",
            "api_key": "AIza***",
            "override_params": {"model": "gemini-1.5-pro"},
            "weight": 0.3
        },
        {
            "provider": "google",
            "api_key": "AIza***",
            "override_params": {"model": "gemini-1.5-flash"},
            "weight": 0.7
        }
    ]
}

client = Portkey().with_options(config=config)

Error Handling

from portkey_ai.exceptions import (
    RateLimitError,
    APIError,
    AuthenticationError
)

try:
    response = client.chat.completions.create(
        model="gemini-2.0-flash-exp",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limit: {e}")
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except APIError as e:
    print(f"API error: {e}")

Key Features

Context Windows

Model	Context Window	Notes
gemini-1.5-pro	2,097,152 tokens	Largest available
gemini-1.5-flash	1,048,576 tokens	Fast processing
gemini-2.0-flash-exp	1,048,576 tokens	Latest generation
gemini-2.0-flash-thinking-exp	32,768 tokens	Reasoning focused

Safety Settings

Gemini includes built-in safety filters. Responses may be blocked if content violates safety thresholds.

Rate Limits

Free tier: 15 requests per minute
Pay-as-you-go: Higher limits based on usage

Best Practices

Use Flash for speed - Gemini Flash is significantly faster
Leverage long context - Process entire documents in one request
Multimodal inputs - Combine text, images, and more
System instructions - Guide behavior with clear instructions
Handle safety blocks - Implement fallbacks for blocked responses
Use embeddings - text-embedding-004 for semantic search
Stream responses - Better UX for long generations

Gemini vs Vertex AI

Feature	Google AI (Gemini)	Vertex AI
Access	Google AI Studio API key	GCP Service Account
Pricing	Pay-per-request	Enterprise pricing
Features	Core features	Additional enterprise features
Authentication	API key	OAuth 2.0, Service Accounts
Use Case	Development, small apps	Production, enterprise

For enterprise deployments, consider using Google Vertex AI which offers additional features like fine-tuning, private endpoints, and SLA.

Pricing

Gemini offers competitive pricing with a free tier:

Gemini Pricing

View detailed pricing for all Gemini models

Google Vertex AI

Enterprise Gemini through GCP

Function Calling

Advanced function calling

Vision Guide

Working with images

Fallbacks

Fallback configurations

Overview

Major Providers

Specialized Providers

Overview

Supported Features

Quick Start

Chat Completions

Streaming

Available Models

Gemini 2.0 (Latest)

Gemini 1.5

Embeddings

Configuration Options

Getting Your API Key

Advanced Features

Vision (Image Understanding)

Function Calling

System Instructions

Long Context Processing

JSON Mode

Embeddings

Fallback Configuration

Load Balancing

Error Handling

Key Features

Context Windows

Safety Settings

Rate Limits

Best Practices

Gemini vs Vertex AI

Pricing

Gemini Pricing

Google Vertex AI

Function Calling

Vision Guide

Fallbacks

Build docs developers (and LLMs) love

Overview

Major Providers

Specialized Providers

​Overview

​Supported Features

​Quick Start

​Chat Completions

​Streaming

​Available Models

​Gemini 2.0 (Latest)

​Gemini 1.5

​Embeddings

​Configuration Options

​Getting Your API Key

​Advanced Features

​Vision (Image Understanding)

​Function Calling

​System Instructions

​Long Context Processing

​JSON Mode

​Embeddings

​Fallback Configuration

​Load Balancing

​Error Handling

​Key Features

​Context Windows

​Safety Settings

​Rate Limits

​Best Practices

​Gemini vs Vertex AI

​Pricing

Gemini Pricing

​Related Resources

Google Vertex AI

Function Calling

Vision Guide

Fallbacks

Build docs developers (and LLMs) love

Overview

Supported Features

Quick Start

Chat Completions

Streaming

Available Models

Gemini 2.0 (Latest)

Gemini 1.5

Embeddings

Configuration Options

Getting Your API Key

Advanced Features

Vision (Image Understanding)

Function Calling

System Instructions

Long Context Processing

JSON Mode

Embeddings

Fallback Configuration

Load Balancing

Error Handling

Key Features

Context Windows

Safety Settings

Rate Limits

Best Practices

Gemini vs Vertex AI

Pricing

Related Resources