Installation
Install Graphiti with Gemini support:Configuration
Environment Variables
.env
Complete Setup
Gemini can be used for LLM inference, embeddings, and cross-encoding:Supported Models
Language Models
Gemini 3 (Preview)
- gemini-3-pro-preview: Most capable, 64K output tokens
- gemini-3-flash-preview (recommended): Fast, efficient, 64K output tokens
Gemini 2.5
- gemini-2.5-pro: Advanced reasoning, 64K output tokens
- gemini-2.5-flash: Balanced performance, 64K output tokens
- gemini-2.5-flash-lite: Fast, cost-effective, 64K output tokens
Gemini 2.0
- gemini-2.0-flash: Fast multimodal, 8K output tokens
- gemini-2.0-flash-lite: Ultra-fast, 8K output tokens
Gemini 1.5
- gemini-1.5-pro: Extended context (2M tokens), 8K output
- gemini-1.5-flash: Fast, 8K output tokens
- gemini-1.5-flash-8b: Smallest, 8K output tokens
Embedding Models
- text-embedding-001 (recommended): General-purpose embeddings
- text-embedding-005: Latest embedding model
- gemini-embedding-001: Multimodal embeddings
Reranking Models
- gemini-2.5-flash-lite (recommended): Optimized for classification
- Any Gemini model with log probabilities support
LLM Configuration
LLM Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | From env | Google API key |
model | str | "gemini-3-flash-preview" | Primary LLM model |
small_model | str | "gemini-2.5-flash-lite" | Model for simpler tasks |
temperature | float | 0.7 | Sampling temperature (0-2) |
max_tokens | int | Model-specific | Maximum output tokens |
Embeddings Configuration
Embedder Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | From env | Google API key |
embedding_model | str | "text-embedding-001" | Embedding model |
embedding_dim | int | 768 | Output dimension |
batch_size | int | 100 | Batch size for embed_content |
Reranking Configuration
Gemini’s reranker uses log probabilities for relevance scoring:Thinking Configuration (Gemini 2.5+)
For models that support thinking (Gemini 2.5+), enable extended reasoning:Structured Output Support
Gemini supports native structured output via JSON schema:- Native JSON mode with schema validation
- Automatic partial JSON salvaging
- Retry logic for malformed responses
Complete Example
Error Handling
Graphiti automatically handles:- Rate Limit Errors: Exponential backoff and retry
- Safety Blocks: Content filtered by safety settings
- Prompt Blocks: Prompts blocked before processing
- Truncation: Partial JSON salvaging from truncated responses
Safety Settings
Gemini has built-in safety filters. If content is blocked:Maximum Output Tokens
| Model Family | Max Output Tokens |
|---|---|
| Gemini 3 | 65,536 (64K) |
| Gemini 2.5 | 65,536 (64K) |
| Gemini 2.0 | 8,192 (8K) |
| Gemini 1.5 | 8,192 (8K) |
When to Use Gemini
Choose Gemini if you:- Need multimodal capabilities (image, video, audio)
- Want extended context windows (1-2M tokens)
- Prefer Google’s safety and content filtering
- Need native JSON schema support
- Want to use Google Cloud infrastructure
- Need GPT-5 reasoning models
- Want faster response times
- Prefer OpenAI’s ecosystem
Cost Optimization
- Use Flash Models: Gemini Flash is fast and cost-effective
- Batch Embeddings: Use batch operations for embeddings
- Adjust Thinking Tokens: Limit thinking tokens for reasoning models
- Monitor Usage: Track API usage via Google Cloud Console