Skip to main content
Google Gemini provides state-of-the-art multimodal AI models with strong reasoning, structured output, and embedding capabilities.

Installation

Install Graphiti with Gemini support:
pip install graphiti-core[google-genai]

Configuration

Environment Variables

.env
GOOGLE_API_KEY=AIza...

Complete Setup

Gemini can be used for LLM inference, embeddings, and cross-encoding:
import os
from graphiti_core import Graphiti
from graphiti_core.llm_client.gemini_client import GeminiClient, LLMConfig
from graphiti_core.embedder.gemini import GeminiEmbedder, GeminiEmbedderConfig
from graphiti_core.cross_encoder.gemini_reranker_client import GeminiRerankerClient

# Configure API key
api_key = os.environ["GOOGLE_API_KEY"]

# Initialize Graphiti with Gemini for all components
graphiti = Graphiti(
    "bolt://localhost:7687",
    "neo4j",
    "password",
    llm_client=GeminiClient(
        config=LLMConfig(
            api_key=api_key,
            model="gemini-2.0-flash"
        )
    ),
    embedder=GeminiEmbedder(
        config=GeminiEmbedderConfig(
            api_key=api_key,
            embedding_model="text-embedding-001"
        )
    ),
    cross_encoder=GeminiRerankerClient(
        config=LLMConfig(
            api_key=api_key,
            model="gemini-2.5-flash-lite"
        )
    )
)

Supported Models

Language Models

Gemini 3 (Preview)

  • gemini-3-pro-preview: Most capable, 64K output tokens
  • gemini-3-flash-preview (recommended): Fast, efficient, 64K output tokens

Gemini 2.5

  • gemini-2.5-pro: Advanced reasoning, 64K output tokens
  • gemini-2.5-flash: Balanced performance, 64K output tokens
  • gemini-2.5-flash-lite: Fast, cost-effective, 64K output tokens

Gemini 2.0

  • gemini-2.0-flash: Fast multimodal, 8K output tokens
  • gemini-2.0-flash-lite: Ultra-fast, 8K output tokens

Gemini 1.5

  • gemini-1.5-pro: Extended context (2M tokens), 8K output
  • gemini-1.5-flash: Fast, 8K output tokens
  • gemini-1.5-flash-8b: Smallest, 8K output tokens

Embedding Models

  • text-embedding-001 (recommended): General-purpose embeddings
  • text-embedding-005: Latest embedding model
  • gemini-embedding-001: Multimodal embeddings

Reranking Models

  • gemini-2.5-flash-lite (recommended): Optimized for classification
  • Any Gemini model with log probabilities support

LLM Configuration

from graphiti_core.llm_client.gemini_client import GeminiClient, LLMConfig

llm_client = GeminiClient(
    config=LLMConfig(
        api_key="AIza...",
        model="gemini-2.0-flash",
        small_model="gemini-2.5-flash-lite",
        temperature=0.7
    ),
    max_tokens=16384  # Override default
)

LLM Configuration Options

ParameterTypeDefaultDescription
api_keystrFrom envGoogle API key
modelstr"gemini-3-flash-preview"Primary LLM model
small_modelstr"gemini-2.5-flash-lite"Model for simpler tasks
temperaturefloat0.7Sampling temperature (0-2)
max_tokensintModel-specificMaximum output tokens

Embeddings Configuration

from graphiti_core.embedder.gemini import GeminiEmbedder, GeminiEmbedderConfig

embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(
        api_key="AIza...",
        embedding_model="text-embedding-001",
        embedding_dim=768  # Default dimension
    ),
    batch_size=100  # Process 100 texts per batch
)

Embedder Configuration Options

ParameterTypeDefaultDescription
api_keystrFrom envGoogle API key
embedding_modelstr"text-embedding-001"Embedding model
embedding_dimint768Output dimension
batch_sizeint100Batch size for embed_content

Reranking Configuration

Gemini’s reranker uses log probabilities for relevance scoring:
from graphiti_core.cross_encoder.gemini_reranker_client import GeminiRerankerClient
from graphiti_core.llm_client.config import LLMConfig

reranker = GeminiRerankerClient(
    config=LLMConfig(
        api_key="AIza...",
        model="gemini-2.5-flash-lite"  # Optimized for classification
    )
)
The reranker uses boolean classification with log probabilities to rank passage relevance, similar to the OpenAI reranker approach.

Thinking Configuration (Gemini 2.5+)

For models that support thinking (Gemini 2.5+), enable extended reasoning:
from google.genai import types

llm_client = GeminiClient(
    config=LLMConfig(model="gemini-2.5-pro"),
    thinking_config=types.ThinkingConfig(
        mode="reasoning",  # Enable reasoning mode
        max_tokens=2048    # Limit thinking tokens
    )
)

Structured Output Support

Gemini supports native structured output via JSON schema:
# Graphiti automatically:
# 1. Converts Pydantic models to JSON schema
# 2. Sets response_mime_type to "application/json"
# 3. Validates responses against schema
# 4. Handles truncation and salvages partial JSON
Benefits:
  • Native JSON mode with schema validation
  • Automatic partial JSON salvaging
  • Retry logic for malformed responses

Complete Example

import asyncio
import os
from datetime import datetime, timezone
from graphiti_core import Graphiti
from graphiti_core.llm_client.gemini_client import GeminiClient, LLMConfig
from graphiti_core.embedder.gemini import GeminiEmbedder, GeminiEmbedderConfig
from graphiti_core.nodes import EpisodeType

async def main():
    api_key = os.environ["GOOGLE_API_KEY"]
    
    # Configure Gemini LLM
    llm_client = GeminiClient(
        config=LLMConfig(
            api_key=api_key,
            model="gemini-2.0-flash",
            temperature=0.7
        )
    )
    
    # Configure Gemini embeddings
    embedder = GeminiEmbedder(
        config=GeminiEmbedderConfig(
            api_key=api_key,
            embedding_model="text-embedding-001"
        )
    )
    
    # Initialize Graphiti
    graphiti = Graphiti(
        "bolt://localhost:7687",
        "neo4j",
        "password",
        llm_client=llm_client,
        embedder=embedder
    )
    
    try:
        # Add an episode
        await graphiti.add_episode(
            name="AI News 1",
            episode_body="Google announced Gemini 3.0 with enhanced multimodal capabilities.",
            source=EpisodeType.text,
            reference_time=datetime.now(timezone.utc)
        )
        
        # Search the graph
        results = await graphiti.search("What are Gemini 3.0's features?")
        for result in results:
            print(f"Fact: {result.fact}")
    
    finally:
        await graphiti.close()

if __name__ == "__main__":
    asyncio.run(main())

Error Handling

Graphiti automatically handles:
  • Rate Limit Errors: Exponential backoff and retry
  • Safety Blocks: Content filtered by safety settings
  • Prompt Blocks: Prompts blocked before processing
  • Truncation: Partial JSON salvaging from truncated responses

Safety Settings

Gemini has built-in safety filters. If content is blocked:
# Exception will indicate the safety category:
# - HARM_CATEGORY_HARASSMENT
# - HARM_CATEGORY_HATE_SPEECH
# - HARM_CATEGORY_SEXUALLY_EXPLICIT
# - HARM_CATEGORY_DANGEROUS_CONTENT

Maximum Output Tokens

Model FamilyMax Output Tokens
Gemini 365,536 (64K)
Gemini 2.565,536 (64K)
Gemini 2.08,192 (8K)
Gemini 1.58,192 (8K)

When to Use Gemini

Choose Gemini if you:
  • Need multimodal capabilities (image, video, audio)
  • Want extended context windows (1-2M tokens)
  • Prefer Google’s safety and content filtering
  • Need native JSON schema support
  • Want to use Google Cloud infrastructure
Choose OpenAI if you:
  • Need GPT-5 reasoning models
  • Want faster response times
  • Prefer OpenAI’s ecosystem

Cost Optimization

  • Use Flash Models: Gemini Flash is fast and cost-effective
  • Batch Embeddings: Use batch operations for embeddings
  • Adjust Thinking Tokens: Limit thinking tokens for reasoning models
  • Monitor Usage: Track API usage via Google Cloud Console

Build docs developers (and LLMs) love