Gemini Provider

Overview

The Gemini Provider integrates Google’s Gemini API into LLM Gateway Core, providing access to state-of-the-art language models hosted on Google’s infrastructure.

Features

Model: gemini-2.5-flash - Fast, efficient conversational AI
Async Support: Full async/await pattern for high concurrency
Conversation History: Maintains multi-turn chat context
Error Handling: Comprehensive exception handling with detailed logging

Configuration

Environment Variables

The Gemini provider requires an API key from Google AI Studio.

Get API Key

Visit Google AI Studio and create an API key

Add to .env

Add your API key to the .env file:

.env

GEMINI_API_KEY=your-api-key-here

Restart Gateway

Restart the gateway service to load the new configuration:

uvicorn app.main:app --reload

Never commit your API key to version control. Always use environment variables or secret management services.

Configuration Settings

app/core/config.py

class Settings(BaseSettings):
    GEMINI_API_KEY: str = ""  # Required for Gemini provider
    PROVIDER_TIMEOUT_SECONDS: int = 60
    PROVIDER_MAX_RETRIES: int = 3

Implementation

Source Code

Here’s the complete implementation of the Gemini provider:

app/providers/gemini.py

import google.generativeai as genai
from app.providers.base import LLMProvider
from app.api.v1.schemas import ChatRequest, ChatResponse, Usage
from app.core.config import settings
import uuid

class GeminiProvider(LLMProvider):
    def __init__(self):
        # Configuration is deferred to chat() to ensure latest .env values
        pass

    @property
    def name(self) -> str:
        return "gemini"
    
    async def chat(self, request: ChatRequest) -> ChatResponse:
        """
        Real Gemini API call.
        """
        try:
            # Configure API key
            genai.configure(api_key=settings.GEMINI_API_KEY)
            model = genai.GenerativeModel('gemini-2.5-flash')

            # Build conversation history
            history = []
            for msg in request.messages[:-1]:
                role = "user" if msg.role == "user" else "model"
                history.append({
                    "role": role, 
                    "parts": [{"text": msg.content}]
                })
            
            # Extract last message
            last_message = request.messages[-1].content
            
            # Start chat session and send message
            chat_session = model.start_chat(history=history)
            response = await chat_session.send_message_async(last_message)
            
            # Return standardized response
            return ChatResponse(
                id=str(uuid.uuid4()),
                provider=self.name,
                content=response.text,
                usage=Usage(
                    prompt_tokens=0, 
                    completion_tokens=0,
                    total_tokens=0
                )
            )
        except Exception as e:
            import traceback
            print(f"[Gemini Error Detailed]\n{traceback.format_exc()}")
            raise

Key Implementation Details

Message Format Conversion

Gemini expects a specific message format:

{
    "role": "user" | "model",  # "model" instead of "assistant"
    "parts": [{"text": "message content"}]
}

The provider converts standard ChatRequest messages to this format.

Conversation History

All messages except the last are passed as history to start_chat()
The last message is sent via send_message_async()
This maintains conversation context across turns

Async Configuration

The API key is configured on each request (not in __init__) to ensure the latest environment variable values are used, supporting hot-reloading during development.

Usage Statistics

Currently returns zeros for token counts. This can be enhanced by parsing Gemini’s response metadata if token tracking is needed.

Usage

Routing to Gemini

The gateway routes requests to Gemini when:

Explicit Model
Model Hint

{
  "model": "gemini",
  "messages": [...]
}

{
  "model_hint": "online",
  "messages": [...]
}

Or use "fast" as a hint for Gemini.

Example Request

curl -X POST http://localhost:8000/v1/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-gateway-123" \
  -d '{
    "model": "gemini",
    "messages": [
      {"role": "user", "content": "What is FastAPI?"}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Example Response

{
  "id": "a3b8c9d2-e4f5-6a7b-8c9d-0e1f2a3b4c5d",
  "provider": "gemini",
  "content": "FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard Python type hints...",
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Error Handling

The provider includes comprehensive error handling:

try:
    # Gemini API call
    ...
except Exception as e:
    import traceback
    print(f"[Gemini Error Detailed]\n{traceback.format_exc()}")
    raise

Common Errors

Invalid API Key

Error: google.generativeai.types.generation_types.BlockedPromptExceptionSolution: Verify your GEMINI_API_KEY is correct and active

Rate Limiting

Error: 429 Too Many RequestsSolution: Implement retry logic or upgrade your API quota

Timeout

Error: TimeoutErrorSolution: Increase PROVIDER_TIMEOUT_SECONDS in settings

Model Information

gemini-2.5-flash

Speed: Very fast response times
Context Window: Large context support
Capabilities: Text generation, conversation, reasoning
Best For: Production applications requiring speed and quality

To use a different Gemini model, modify the model name in gemini.py:22:

model = genai.GenerativeModel('gemini-2.5-pro')  # Or other model

Next Steps

Ollama Provider

Learn about local model deployment

Custom Providers

Implement your own provider

Rate Limiting

Configure rate limits

Caching

Enable response caching

Get Started

Core Concepts

Providers

Observability

Deployment

Overview

Features

Configuration

Environment Variables

Configuration Settings

Implementation

Source Code

Key Implementation Details

Usage

Routing to Gemini

Example Request

Example Response

Error Handling

Common Errors

Model Information

gemini-2.5-flash

Next Steps

Ollama Provider

Custom Providers

Rate Limiting

Caching

Build docs developers (and LLMs) love

Get Started

Core Concepts

Providers

Observability

Deployment

​Overview

​Features

​Configuration

​Environment Variables

​Configuration Settings

​Implementation

​Source Code

​Key Implementation Details

​Usage

​Routing to Gemini

​Example Request

​Example Response

​Error Handling

​Common Errors

​Model Information

​gemini-2.5-flash

​Next Steps

Ollama Provider

Custom Providers

Rate Limiting

Caching

Build docs developers (and LLMs) love

Overview

Features

Configuration

Environment Variables

Configuration Settings

Implementation

Source Code

Key Implementation Details

Usage

Routing to Gemini

Example Request

Example Response

Error Handling

Common Errors

Model Information

gemini-2.5-flash

Next Steps