Overview
The Gemini Provider integrates Google’s Gemini API into LLM Gateway Core, providing access to state-of-the-art language models hosted on Google’s infrastructure.Features
- Model:
gemini-2.5-flash- Fast, efficient conversational AI - Async Support: Full async/await pattern for high concurrency
- Conversation History: Maintains multi-turn chat context
- Error Handling: Comprehensive exception handling with detailed logging
Configuration
Environment Variables
The Gemini provider requires an API key from Google AI Studio.Get API Key
Visit Google AI Studio and create an API key
Configuration Settings
app/core/config.py
Implementation
Source Code
Here’s the complete implementation of the Gemini provider:app/providers/gemini.py
Key Implementation Details
Message Format Conversion
Message Format Conversion
Gemini expects a specific message format:The provider converts standard
ChatRequest messages to this format.Conversation History
Conversation History
- All messages except the last are passed as history to
start_chat() - The last message is sent via
send_message_async() - This maintains conversation context across turns
Async Configuration
Async Configuration
The API key is configured on each request (not in
__init__) to ensure the latest environment variable values are used, supporting hot-reloading during development.Usage Statistics
Usage Statistics
Currently returns zeros for token counts. This can be enhanced by parsing Gemini’s response metadata if token tracking is needed.
Usage
Routing to Gemini
The gateway routes requests to Gemini when:- Explicit Model
- Model Hint
Example Request
Example Response
Error Handling
The provider includes comprehensive error handling:Common Errors
Invalid API Key
Invalid API Key
Error:
google.generativeai.types.generation_types.BlockedPromptExceptionSolution: Verify your GEMINI_API_KEY is correct and activeRate Limiting
Rate Limiting
Error:
429 Too Many RequestsSolution: Implement retry logic or upgrade your API quotaTimeout
Timeout
Error:
TimeoutErrorSolution: Increase PROVIDER_TIMEOUT_SECONDS in settingsModel Information
gemini-2.5-flash
- Speed: Very fast response times
- Context Window: Large context support
- Capabilities: Text generation, conversation, reasoning
- Best For: Production applications requiring speed and quality
To use a different Gemini model, modify the model name in
gemini.py:22:Next Steps
Ollama Provider
Learn about local model deployment
Custom Providers
Implement your own provider
Rate Limiting
Configure rate limits
Caching
Enable response caching